Web Scraping with AutoHotkey 110- Saving Files / Images from a URL / Hyperlink

While URLDownloadToFile is a built-in function for downloading binary files (Images, Word files, Spreadsheets, etc.) I’ve, and others, have had issues using it (I think it was that I was behind a proxy at work).   In the below video I demonstrate how to use the URLDownloadToFile command as well as demonstrate a function I borrowed from Maestrith (Author of AutoHotkey Studio).

Saving Files / Images from a URL / Hyperlink

 

Here is the code I walk through in the first part of the above video (demonstrating the built-in functionality and an example calling the function.

#SingleInstance,Force
;*************URL download to File*************************
url:="https://i2.wp.com/www.toptenia.com/wp-content/uploads/2017/08/gal-gadot.jpg"
;*********************url Download to file**********************************
UrlDownloadToFile, % url, % "Gal_Gadot_URLDownload.jpg" ;simple built-in way to download a file given a url


Download_File_XMLHTTP(URL) ;Call the function
;********************created by Maestrith but tweaked by Joe Glines***********************************
Download_File_XMLHTTP(URL){
	SplitPath,URL,File_Name ;get file name from URL
	req:=ComObjCreate("MSXML2.XMLHTTP.6.0"),ado:=ComObjCreate("ADODB.Stream")
	req.Open("HEAD",URL),req.Send() 
	ado.Type:=1
	req.Open("GET",URL,1),req.Send()
	while(req.ReadyState!=4){
		Sleep,50
	}
	ado.Open(),ado.Write(req.ResponseBody),ado.SaveToFile(File_Name,2),ado.Close()
	Sleep, 100
}

And here is the code where I demonstrate how you can get a list of images and iterate over them in an object (calling the download function)

#SingleInstance,Force
global Obj:=[] ;Creates obj holder for variables
;**************************************
pwb := WBGet()
MsgBox % pwb.document.images.length ;Show how many images there are
ComObjError(false)  ;Need to turn off so doesn't trigger error
;******example with While loop***Note a_index-1 is in first row, not each individual one*	 
While(ele:=pwb.document.links[a_index-1]){ ;store reference to element in ele While looping over elements
	if InStr(ele.href,"https://www.google.com/imgres?imgurl="){ ;if one of the images from Google.com
		obj.InsertAt(A_index-1,StrSplit(uri_decode(StrSplit(ele.href,"https://www.google.com/imgres?imgurl=").2),["?","&"]).1) ;Strip out a lot of the un-wanted text
	}
}
for k, v in obj{
	Download_File_XMLHTTP(v)
	Sleep, 100
}
ComObjError(True)  ;Turn back on

;********************created by Maestrith but tweaked by Joe Glines***********************************
Download_File_XMLHTTP(URL){
	SplitPath,URL,File_Name ;get file name from URL
	req:=ComObjCreate("MSXML2.XMLHTTP.6.0"),ado:=ComObjCreate("ADODB.Stream")
	req.Open("HEAD",URL),req.Send() 
	ado.Type:=1
	req.Open("GET",URL,1),req.Send()
	while(req.ReadyState!=4){
		Sleep,50
	}
	ado.Open(),ado.Write(req.ResponseBody),ado.SaveToFile(File_Name,2),ado.Close()
	Sleep, 100
}


;~ http://www.autohotkey.com/board/topic/47052-basic-webpage-controls-with-javascript-com-tutorial/
;~ wb := WBGet()
WBGet(WinTitle="ahk_class IEFrame", Svr#=1) {               ;// based on ComObjQuery docs
   static msg := DllCall("RegisterWindowMessage", "str", "WM_HTML_GETOBJECT")
        , IID := "{0002DF05-0000-0000-C000-000000000046}"   ;// IID_IWebBrowserApp
;//     , IID := "{332C4427-26CB-11D0-B483-00C04FD90119}"   ;// IID_IHTMLWindow2
   SendMessage msg, 0, 0, Internet Explorer_Server%Svr#%, %WinTitle%
   if (ErrorLevel != "FAIL") {
      lResult:=ErrorLevel, VarSetCapacity(GUID,16,0)
      if DllCall("ole32\CLSIDFromString", "wstr","{332C4425-26CB-11D0-B483-00C04FD90119}", "ptr",&GUID) >= 0 {
         DllCall("oleacc\ObjectFromLresult", "ptr",lResult, "ptr",&GUID, "ptr",0, "ptr*",pdoc)
         return ComObj(9,ComObjQuery(pdoc,IID,IID),1), ObjRelease(pdoc)
      }
   }
}

Uri_Decode(str) {
		Loop
			If RegExMatch(str, "i)(?<=%)[\da-f]{1,2}", hex)
				StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
		Else Break
			Return, str
	}
	
Uri_Encode(Uri, full = 0)
	{
		oSC := ComObjCreate("ScriptControl")
		oSC.Language := "JScript"
		Script := "var Encoded = encodeURIComponent(""" . Uri . """)"
		oSC.ExecuteStatement(Script)
		encoded := oSC.Eval("Encoded")
		Return encoded
	}

 

AutoHotkey webinar- Copying / Editing / Saving files & Folders

In this AutoHotkey Webinar we covered: Working with Files & Folders (Here are the files demonstrated during the webinar)

Video Hour 1: High-level:

  1. Moving / Copying / Deleting Files / Folders
    1. File Copy / Move
    2. File Delete
    3. Loop Files & Folders
  2. Creating / Editing Files
    1. File Encoding
    2. FileRead & FileAppend
    3. Loop FileRead & FileReadLine
    4. File Object

Video Hour 2: Q&A

AutoHotkey Merchandise-White Stress ballScript Highlight-  Locker

  • Kill power to monitors & block most keys & mouse
  • Great to use instead of using Windows + L and having to enter yoiur crazy corporate password

Copy, Move, Delete : Files & Folders

FileCopy – SourcePattern, DestPattern [, Flag]

  • FileCopy copies files only. To copy a single folder (including its subfolders), use FileCopyDir. To instead copy the contents of a folder (all its files and subfolders), see the examples section of FileCopy

FileCopyDir – FileCopyDir, Source, Dest [, Flag]

  • Copies a folder along with all its sub-folders and files (similar to xcopy)
  • FileCopyDir copies a single folder. If the destination directory structure doesn’t exist it will be created if possible

FileMove – FileMove, SourcePattern, DestPattern [, Flag]

  • FileMove moves files only. To move or rename a single folder, use FileMoveDir

FileDelete – FileDelete, FilePattern

  • To remove an entire folder, along with all its sub-folders and files, use FileRemoveDir
  • To delete a read-only file, first remove the read-only attribute. For example: FileSetAttrib, -R, C:\My File.txt

Looping Files & Folders

Loop (files & folders) – Loop, Files, FilePattern [, Mode]      Mode:  D=Directories, F=Files, R=Recursive

FilePattern: The name of a single file or folder, or a wildcard pattern

  • Retrieves the specified files or folders, one at a time
  • A file-loop is useful when you want to operate on a collection of files and/or folders, one at a time
  • The following Special Variables exist in any file-loop
    • A_LoopFileName
    • A_LoopFileExt
    • A_LoopFileFullPath
    • A_LoopFileLongPath
    • A_LoopFileShortPath
    • A_LoopFileShortName
    • A_LoopFileDir
    • A_LoopFileTimeModified
    • A_LoopFileTimeCreated
    • A_LoopFileTimeAccessed
    • A_LoopFileAttrib
    • A_LoopFileSize
    • A_LoopFileSizeKB
    • A_LoopFileSizeMB


File Encoding

  • Sets the default encoding for FileReadFileReadLineLoop ReadFileAppend, and FileOpen
  • Encoding can be one of the following values:
  • UTF-8: Unicode UTF-8, equivalent to CP65001
  • UTF-16: Unicode UTF-16 with little endian byte order, equivalent to CP1200
  • UTF-8-RAW or UTF-16-RAW: As above, but no Byte Order Mark (BOM*) is written when a new file is created
  • CPnnn: a code page with numeric identifier nnn. See Code Page Identifiers. (UTF-8 is CP65001, UTF-16 is CP1200)
  • Empty or omitted: the system default ANSI code page, which is also the default setting* The byte order mark (BOM) is a Unicode character, U+FEFF byte order mark (BOM), whose appearance as a magic number at the start of a text stream can signal several things to a program consuming the text

FileRead & FileAppend

File Read FileRead, OutputVar, Filename

  • Reads file’s content into a variable
  • FileRead, Var, *P65001 %file_path% ;-Read in the file using UTF-8 Encoding
  • When the goal is to load all or a large part of a file into memory, FileRead performs much better than using a file-reading loop.
  • FileOpen() provides more advanced functionality than FileRead, such as reading or writing data at a specific location in the file without reading the entire file into memory
  • When Reading / Writing to a file many times, FileObject is much faster as it does not open/close the file each time

FileAppend FileAppend [, Text, Filename, Encoding]

  • Writes text to the end of a file (first creating the file, if necessary)
  • FileAppend,%data%,%File_Name%.txt,UTF-8
  • To overwrite an existing file, delete it with FileDelete prior to using FileAppend
  • * don’t forget to add a line break at the end of each row!

LoopFileRead, FileReadLine

Loop (read file contents)   Loop, Read, InputFile [,OutputFile]

  • A file-reading loop is useful when you want to operate on each line contained in a text file, one at a time. It performs better than using FileReadLine because:
    • the file can be kept open for the entire operation
    • the file does not have to be re-scanned each time to find the requested line number.
  • Lines up to 65,534 characters long can be read. If the length of a line exceeds this, its remaining characters will be read during the next loop iteration
  • To load an entire file into a variable, use FileRead because it performs much better than a loop (especially for large files).

FileReadLine  FileReadLine, OutputVar, Filename, LineNum

  • Reads the specified line from a file and stores the text in a variable
  • It is strongly recommended to use this command only for small files, or in cases where only a single line of text is needed. To scan and process a large number of lines (one by one), use a file-reading loopfor best performance. To read an entire file into a variable, use FileRead

FileObject

FileOpen – file := FileOpen(Filename, Flags [, Encoding])

  • Flags (r=Read, w=Write, a=Append, rw=Read/Write , h=Handle, -rwd=Lock file/deny access)

ReadLine–  TextLine := File.ReadLine()

  • Reads a line of text from the file and advances the file pointer

Seek–  TextLine := File.Seek()

  • Distance to move, in bytes. Lower values are closer to the beginning of the file

AtEOF (End of File)- IsAtEOF := File.AtEOF

  • Retrieves a non-zero value if the file pointer has reached the end of the file, otherwise zero.

WriteLine–  File.WriteLine([String])

  • Writes a string of characters followed by `n or `r`n depending on the flags used to open the file. Advances the file pointer

Encoding– File.Encoding

  • RetrievedEncoding and NewEncoding is a numeric code page identifier (e.g. CP65001)

Close–  File.Close()

  • Although the file is closed automatically when the object is freed, it is recommended to close the file as soon as possible

COM Objects & DLL call