Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Mozilla Firefox.
Web scraping is closely related to web indexing, which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software. Uses of web scraping include online price comparison, contact scraping, weather data monitoring, website change detection, research, web mashup and web data integration.
A while back someone wrote me saying how IE is dead and wondering how we can automate other browsers. While IE is definitely on it’s death-bed, I do still automate IE for sites that will let IE load. Someone mentioned that Windows 11 completely removes Internet Explorer thus, if you’re running Windows 11, automating IE is not an option. I looked into this and had some very interesting discoveries:
The IWB2 Learner tool works within Edge when in “IE Mode”
In Windows 11, you can add IE back and still use it in IE mode. (I’ll document how I did this in a later video)
When I realized the above, I played with Edge (in IE Mode) in Windows 10 & Windows 11 and was able to connect to the DOM! Granted my approach sucked but I asked Tank (Charlie Simmons) to take a look at it and he borrowed on the concept and re-wrote what i did into something that is decentily reliable. You can get the download here
Here’s a video showing how I use AutoHotkey to Automate IE from within Edge!
The MSXML2.XMLHTTP and WinHttpRequest COM objects are both used to send HTTP requests from an AutoHotkey script. However, there are some differences between the two:
Compatibility: The MSXML2.XMLHTTP object is available on all versions of Windows, while the WinHttpRequest object is only available on Windows XP and later.
Performance: In general, the WinHttpRequest object is faster and more efficient than the MSXML2.XMLHTTP object, because it uses the Windows HTTP Services (WinHTTP) library to send requests, which is optimized for HTTP communications.
Features: The WinHttpRequest object supports a wider range of features than the MSXML2.XMLHTTP object, including the ability to send HTTPS requests, specify proxy settings, and authenticate with a server using different authentication methods.
In general, if you are running AutoHotkey on a newer version of Windows and you need the additional features and performance of the WinHttpRequest object, you should use that object instead of the MSXML2.XMLHTTP object. However, if you need to support older versions of Windows or you do not need the advanced features of the WinHttpRequest object, you can use the MSXML2.XMLHTTP object instead.
Here’s the syntax I used for the WinHTTPRequest example
And here’s the corresponding two XML API calls I demonstrated in the video
Notes from Create and connect to new Chrome profiles with AutoHotkey
00:09 There’s been a lot of confusion on Chrome profiles. What they are, why you should concern yourself with them when using AutoHotkey.
00:23 Your Chrome profile is what keeps you logged into websites, connected to Google, etc. Most of the time you won’t need access to your entire Chrome profile. But you might want to start with a blank slate if you’re distributing your code to people. Or you want to create a new instance of Chrome that you don’t want attached to an existing Chrome tab. For any of those, you need to have a Chrome profile
01:17 This is because most people don’t have the remote debugging flag on their default shortcut. If you launch chrome with debugging code, it will automatically group it with the current process window.
02:38 So instead of spawning a new Chrome window that is listening to the debug window, it will open a new page on the existing Chrome instance without the debugging access (even though you specified debugging)
02:52 So in order to get Chrome to open a new instance, you need to use the Chrome profile.
03:51 Create a folder (name it what you want) and tell Chrome to use it the profile flag. “–user data-dir” with your directory i.e. “–user data-dir-C:\temp\newProfile”
05:18 Looking in the profile folder, you can see Chrome has generated a bunch of files. Things like Cookies, browser history, etc. Everything Chrome remembers…
05:47 If you’re targeting portable Chrome, making sure you have this profile set correctly can be a big deal! If you use AutoHotkey to launch portable Chrome, it might still load the default profile. Make sure you specify the Chrome profile!
07:32 Everywhere you would have used Chrome. In your script, use ChromeInst. (i.e. Instead of Chrome.GetPage use ChromeInst.GetPage.
07:32 That tells Chrome to look for this new / specific instance of Chrome instead of the default version. Remember, it’s only “new” right after you make it.