Scraping data from a table with AutoHotkey & IE

In this tutorial I walk through how to “walk the DOM” (DOM= Document Object Model) with IE and AutoHotkey.

I used to grab all the text and then try and use a regular expression to parse the data.  Taking advantage of the DOM is critical to learning how to scrape pages efficiently.   In this case I’m simply relying on built in tags to an HTML table.

Here is the first bit of code I mention to scrape data from a table

#SingleInstance,Force
pwb := WBGet() ;Get handle to IE Page
Tab:=pwb.document.getElementsbyClassName("t1tease")[0].getElementsbyTagName("Table")[1] ;Create an "alias" to the table
Loop, % Tab.getElementsbyTagName("TR").length { ;loop over each row
Row:=Tab.getElementsbyTagName("TR")[A_Index+1] ;create an alias for the current row
State:=row.getElementsbyTagName("TD")[0].InnerTEXT
TaxDollper1000:=row.getElementsbyTagName("TD")[1].InnerTEXT
Rank:=row.getElementsbyTagName("TD")[2].InnerTEXT
data.=State A_tab TaxDollper1000 a_tab Rank "`r"
State:=TaxDollper1000:=rank:=""
}
DebugWindow(data,Clear:=1,LineBreak:=1,Sleep:=500,AutoHide:=0) ;Show output in debug window for studio.  If you're not using ahk Studio, use a message box

Example looping over using code from my syntax writer which can be found here.

#SingleInstance,Force
pwb := WBGet() ;Get handle to IE Page

;~ MsgBox % pwb.document.getElementsbyTagName("Table")[0].InnerText
;*********** loop over table*****Maestrith helped significantly with the listview portion*******
pwb := WBGet() ;connect to current IE window (Make sure WBGet function is in your library or this script)
Gui,DD:destroy
loop, % Pwb.Document.All.Tags("TABLE").length ;get count of all tables on page
    Table_List.=A_index-1 "|" ;prep for dropdown list

gui,DD:add, dropdownlist,w200 r10 vTable_Nb gSubmit_All, %Table_List%
gui,DD:show
return

Submit_all:
Gui,DD:Submit
Gui,DD:destroy

;***********now extract data*******************
Data:=[]
loop, % Pwb.Document.All.Tags("TABLE")[Table_Nb].Rows.Length-1 {
  Row:=Pwb.Document.All.Tags("TABLE")[Table_Nb].Rows[A_Index-1]
  rows:="" ;clear out rows
  loop, % row.cells.length{
    rows.= row.cells[A_Index-1].innerTEXT a_tab
  }
  if(A_Index=1)
    Headers:=RegExReplace(rows,"\t","|")
  else
    Data.Push(StrSplit(rows,"	")) ;add rows to data object
}

Gui,Add,ListView,h900 w1200,%Headers%
for a,b in Data
  LV_Add("",b*) ;use variadic function to add columns
Loop,% LV_GetCount("Column")
  LV_ModifyCol(A_Index,"AutoHDR") ;adjust column width based on data
gui, show
Table_List:=""
return