In this tutorial I walk through how to “walk the DOM” (DOM= Document Object Model) with IE and AutoHotkey.
I used to grab all the text and then try and use a regular expression to parse the data. Taking advantage of the DOM is critical to learning how to scrape pages efficiently. In this case I’m simply relying on built in tags to an HTML table.
Here is the first bit of code I mention to scrape data from a table
#SingleInstance,Force pwb := WBGet() ;Get handle to IE Page Tab:=pwb.document.getElementsbyClassName("t1tease")[0].getElementsbyTagName("Table")[1] ;Create an "alias" to the table Loop, % Tab.getElementsbyTagName("TR").length { ;loop over each row Row:=Tab.getElementsbyTagName("TR")[A_Index+1] ;create an alias for the current row State:=row.getElementsbyTagName("TD")[0].InnerTEXT TaxDollper1000:=row.getElementsbyTagName("TD")[1].InnerTEXT Rank:=row.getElementsbyTagName("TD")[2].InnerTEXT data.=State A_tab TaxDollper1000 a_tab Rank "`r" State:=TaxDollper1000:=rank:="" } DebugWindow(data,Clear:=1,LineBreak:=1,Sleep:=500,AutoHide:=0) ;Show output in debug window for studio. If you're not using ahk Studio, use a message box
Example looping over using code from my syntax writer which can be found here.
#SingleInstance,Force pwb := WBGet() ;Get handle to IE Page ;~ MsgBox % pwb.document.getElementsbyTagName("Table")[0].InnerText ;*********** loop over table*****Maestrith helped significantly with the listview portion******* pwb := WBGet() ;connect to current IE window (Make sure WBGet function is in your library or this script) Gui,DD:destroy loop, % Pwb.Document.All.Tags("TABLE").length ;get count of all tables on page Table_List.=A_index-1 "|" ;prep for dropdown list gui,DD:add, dropdownlist,w200 r10 vTable_Nb gSubmit_All, %Table_List% gui,DD:show return Submit_all: Gui,DD:Submit Gui,DD:destroy ;***********now extract data******************* Data:=[] loop, % Pwb.Document.All.Tags("TABLE")[Table_Nb].Rows.Length-1 { Row:=Pwb.Document.All.Tags("TABLE")[Table_Nb].Rows[A_Index-1] rows:="" ;clear out rows loop, % row.cells.length{ rows.= row.cells[A_Index-1].innerTEXT a_tab } if(A_Index=1) Headers:=RegExReplace(rows,"\t","|") else Data.Push(StrSplit(rows," ")) ;add rows to data object } Gui,Add,ListView,h900 w1200,%Headers% for a,b in Data LV_Add("",b*) ;use variadic function to add columns Loop,% LV_GetCount("Column") LV_ModifyCol(A_Index,"AutoHDR") ;adjust column width based on data gui, show Table_List:="" return