Web Scraping with AutoHotkey 110- Saving Files / Images from a URL / Hyperlink

While URLDownloadToFile is a built-in function for downloading binary files (Images, Word files, Spreadsheets, etc.) I’ve, and others, have had issues using it (I think it was that I was behind a proxy at work).   In the below video I demonstrate how to use the URLDownloadToFile command as well as demonstrate a function I borrowed from Maestrith (Author of AutoHotkey Studio).

Saving Files / Images from a URL / Hyperlink

 

Here is the code I walk through in the first part of the above video (demonstrating the built-in functionality and an example calling the function.

#SingleInstance,Force
;*************URL download to File*************************
url:="https://i2.wp.com/www.toptenia.com/wp-content/uploads/2017/08/gal-gadot.jpg"
;*********************url Download to file**********************************
UrlDownloadToFile, % url, % "Gal_Gadot_URLDownload.jpg" ;simple built-in way to download a file given a url


Download_File_XMLHTTP(URL) ;Call the function
;********************created by Maestrith but tweaked by Joe Glines***********************************
Download_File_XMLHTTP(URL){
	SplitPath,URL,File_Name ;get file name from URL
	req:=ComObjCreate("MSXML2.XMLHTTP.6.0"),ado:=ComObjCreate("ADODB.Stream")
	req.Open("HEAD",URL),req.Send() 
	ado.Type:=1
	req.Open("GET",URL,1),req.Send()
	while(req.ReadyState!=4){
		Sleep,50
	}
	ado.Open(),ado.Write(req.ResponseBody),ado.SaveToFile(File_Name,2),ado.Close()
	Sleep, 100
}

And here is the code where I demonstrate how you can get a list of images and iterate over them in an object (calling the download function)

#SingleInstance,Force
global Obj:=[] ;Creates obj holder for variables
;**************************************
pwb := WBGet()
MsgBox % pwb.document.images.length ;Show how many images there are
ComObjError(false)  ;Need to turn off so doesn't trigger error
;******example with While loop***Note a_index-1 is in first row, not each individual one*	 
While(ele:=pwb.document.links[a_index-1]){ ;store reference to element in ele While looping over elements
	if InStr(ele.href,"https://www.google.com/imgres?imgurl="){ ;if one of the images from Google.com
		obj.InsertAt(A_index-1,StrSplit(uri_decode(StrSplit(ele.href,"https://www.google.com/imgres?imgurl=").2),["?","&"]).1) ;Strip out a lot of the un-wanted text
	}
}
for k, v in obj{
	Download_File_XMLHTTP(v)
	Sleep, 100
}
ComObjError(True)  ;Turn back on

;********************created by Maestrith but tweaked by Joe Glines***********************************
Download_File_XMLHTTP(URL){
	SplitPath,URL,File_Name ;get file name from URL
	req:=ComObjCreate("MSXML2.XMLHTTP.6.0"),ado:=ComObjCreate("ADODB.Stream")
	req.Open("HEAD",URL),req.Send() 
	ado.Type:=1
	req.Open("GET",URL,1),req.Send()
	while(req.ReadyState!=4){
		Sleep,50
	}
	ado.Open(),ado.Write(req.ResponseBody),ado.SaveToFile(File_Name,2),ado.Close()
	Sleep, 100
}


;~ http://www.autohotkey.com/board/topic/47052-basic-webpage-controls-with-javascript-com-tutorial/
;~ wb := WBGet()
WBGet(WinTitle="ahk_class IEFrame", Svr#=1) {               ;// based on ComObjQuery docs
   static msg := DllCall("RegisterWindowMessage", "str", "WM_HTML_GETOBJECT")
        , IID := "{0002DF05-0000-0000-C000-000000000046}"   ;// IID_IWebBrowserApp
;//     , IID := "{332C4427-26CB-11D0-B483-00C04FD90119}"   ;// IID_IHTMLWindow2
   SendMessage msg, 0, 0, Internet Explorer_Server%Svr#%, %WinTitle%
   if (ErrorLevel != "FAIL") {
      lResult:=ErrorLevel, VarSetCapacity(GUID,16,0)
      if DllCall("ole32\CLSIDFromString", "wstr","{332C4425-26CB-11D0-B483-00C04FD90119}", "ptr",&GUID) >= 0 {
         DllCall("oleacc\ObjectFromLresult", "ptr",lResult, "ptr",&GUID, "ptr",0, "ptr*",pdoc)
         return ComObj(9,ComObjQuery(pdoc,IID,IID),1), ObjRelease(pdoc)
      }
   }
}

Uri_Decode(str) {
		Loop
			If RegExMatch(str, "i)(?<=%)[\da-f]{1,2}", hex)
				StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
		Else Break
			Return, str
	}
	
Uri_Encode(Uri, full = 0)
	{
		oSC := ComObjCreate("ScriptControl")
		oSC.Language := "JScript"
		Script := "var Encoded = encodeURIComponent(""" . Uri . """)"
		oSC.ExecuteStatement(Script)
		encoded := oSC.Eval("Encoded")
		Return encoded
	}

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.