I found this video from two guys which took a process of checking to see if a name was on a terrorist watch lists which originally took 14 days to compute down to 5 minutes What’s in a Name? Fast Fuzzy String Matching – Seth Verrinder & Kyle Putnam – Midwest.io 2015
Below are my notes from watching the fuzzy string match video (it is ~40 minutes long but very interesting)
1) throw more hardware
2) use another variable/field (zip code / country etc.)
4) metric trees (example: Lowenstein distance)
5) Brute force (Jaro Winkler is pretty fast already) (5X down to 70hrs )
6) Filtering- estimate similarity first then filter (7x down to 50 hrs 18 minutes in video)
· Length of strings (name length often is not normally distributed so doesn’t rule out too much) Probably still look at 70%
· 26 Character filter- search for character that isn’t shared- This dropped out quite a bit but was slow (300x down to 65 minutes)
o Bitmap filter- use bitwise operations to get unmatched count- very fast! (340X down to 60 minutes 20 minutes in video)
o 64 character filter (used all bits)- checked for multiple occurrences of a given letter
7) Minimize recalculation (4,000x down to 5 minutes – 28 minutes in video)
· sort names and groups into segments
· common length and first character
· used WolframAlpha to help show formula
Learnings from Fuzzy String Match process
· Measure performance and focus on bottleneck
· Order of magnitude doesn’t always tell you about actual performance
· Favor simplicity
AHK Studio is an amazing and impressive IDE / Editor for AutoHotkey
I had an in-depth Hangout with Chad Wilson (Maestrith on the forum), the Author and Designer of AHK Studio. Check out my AHK Studio tutorials here
While I’ve been, and still am, a very satisfied SciTE4AHKuser, I was very impressed with many aspects of the tool. It is very intuitive to use and offers some great features that will simplify a coding. Not surprisingly AHK Studio is loaded with HotKeys that, once you familiarize yourself with them, will be awesome! While advanced programmers in AutoHotkey will love the advanced functionality, Noobs will enjoy it’s simplicity.
Here are links where you can you can download it from the AHK forum or from GitHub. Please keep in mind it is still in development. (This is both good and bad. It is good because Chad is very active and open to tweaks/fixes/improvements, bad because “kinks” are never fun)
Here are a few videos on AHK Studio and below is the nearly 2-hour video demonstrating some of the configuration settings and functionality.
Setup & Review of AHK Studio- Great IDE & editor for AutoHotkey
Sometimes I’d like to be able to, progromatically, extract values listed inside a program. Unfortunately many programs I use do not allow the ability to get text from a list box.
One of AHKs great strengths is how well it “hooks” into Windows. I wrote an AutoHotKey script which allows me to copy and paste a list of items selected in the window. There are lots of options like retrieve all items, only those selected, obtain the count of either previously mentioned. Once you have all the items you can send instructions back to the list box and specify which one you want selected (thus if you frequently go back and select the same items, it can automate the process.
ControlGet, Sel_CT, List, Count Selected, SysListView321, A ;Gets count of items selected from last active window
ToolTip % Sel_CT
#IfWinActive ahk_class #32770 ;Only run below if in Specific window type
ControlGet, Selected_Items,List,Selected ,SysListView321, A ;gets Selected Items in last active window
ControlGet, Selected_CT ,List,Selected Count,SysListView321, A ;gets count of selected items in last active window
ControlGet, All_Items,List, ,SysListView321, A ;gets list of all items in last active window
ControlGet, All_CT ,List, Count,SysListView321, A ;gets count of all items in last active window
MsgBox % "Number of Items selected: " Selected_CT "`r`r" Selected_Items
MsgBox % "Number of Items selected: " All_CT "`r`r" All_Items
The syntax around SPSS variables withmissing values is not intuitive, confusing, and poorly documented!
I know of 3 different types of commands and knowing which one to use when is not clear. Setting SPSS missing values is a great way to simplify your analysis. It is also a user-friendly way to remove (hide) outliers. This video gives a short demo of how to use the three that I use frequently.
If you want to declare a value in a cell as missing the following syntax will give you a good start.
if Var1=1 Var1=$sysmis.
If you want to remove the values that are in a variable (define them as missing) the following syntax will be what you need.
MISSING VALUES Var1 to Var10 (99).
SPSS missing values Macros
Below are two macros to help with missing data. The first one is used when you first want to see if there is a given value present in another variable before declaring it recoding the missing a zero. The second one will recode all variables with missing values a zero.
DEFINE !Rep_Miss (Beg !TOKENS (1) /Prez !TOKENS (1) /End !TOKENS (1))
Do if !PREZ>0.
do repeat v=!BEG to !END.
if missing (v) v=0.
!Rep_Miss Prez=presentvariable Beg=v11 End=v19.
DEFINE !Rep_Miss2 (Beg !TOKENS (1) /End !TOKENS (1))
do repeat v=!BEG to !END.
if missing (v) v=0.
/*!Rep_Miss2 Beg=v11 End=v19.