• Intro to AutoHotkey HotStrings with AutoHotkey Intermediate AutoHotkey GUIs are Easy with AutoHotkey Intro to DOS & AutoHotkey

AutoHotkey Webinar- Intro to Web Scraping

In this AutoHotkey Webinar we cover an Intro to Web Scraping.

Make sure you get our Web Scraping Syntax writer!

Intro to Web Scraping

Video Hour 1: High-level:

  • What is an API call
  • Web Scraping vs. WebServices / API
  • Document Object Model (DOM)
  • Structure of a web page
  • Frequently used Methods & properties

Video Hour 2: Coding and Q&A

  • Building code
  • Using AutoHotkey syntax writer
  • IE verse Edge
  • Selenium for Chrome, Firefox, IE , etc.

Script Highlight: Long-pressing Hotkey to Exit a script

The below script is more of an example of how you can have more than one hotkey for a given key by tying it to how long you press the key.  In the below script, if you hold down Escape longer than 1/2 a second it will close the active program.

PowerPoint Deck presented in Intro to Web Scraping

Q) What is an API? (Application Programming Interface?)

A) Basically it is how devices (computers, tablets, phones, servers, etc) talk to each other.

What is an API

Q) What are examples of common API calls

A) Below are a list of some examples (but the list is limitless)

  • Web Browser pulling up products on Amazon.com
  • Application / Software querying products for sale on Amazon.com
  • App on your phone getting latest Weather
  • Database pulling updated sales report
  • Using your Tablet to Select movies to watch on Netflix
  • DropBox application syncing files between your computer & cloud
  • Using Web browser to access files in DropBox

Q) What are the Main Differences between Webservice / API call & Web Browser

webservice verse browser

Q) What is the HTML DOM: Document Object Model

A) Think of the DOM as the “road map” for a webpage.  Depending on what you are trying to get/set, you’ll rely on the DOM to get the exact element you’re looking for.   Below is an image which stresses the more-frequently-used parts.

HTML DOM

Q) How does a Family Tree similar to the DOM?

A) Both are “hierarchical” and, depending on what elements are populated, you can reliably “talk” to the correct node.  The first tree below shows how some nodes have all elements populated while some are missing some (like Name and ID)

family tree 1

Q) What happens if a new Node is inserted in the Tree?

A) It depends on which element you are looking at.  Names and IDs probably will not change however Tagnames often will.  This is why it is recommended to first use IDs, then Names, then ClassNames.  Use Tagname last (or go up the tree a bit and grab an ID/Name then use that “branch”)

family tree 2

Q) What are the frequently used methods for selecting an Element?

A) The methods below are sorted by the most reliable/stable (not presence).  When ID is present- use it!  If no ID, Name is another great one…  Classname is also becoming more reliable

  1. getElementByID– Great because if it is present it is should always unique
  2. getElementsByName– While not required to be unique, it often is so very reliable
  3. getElementsByClassName– Often unique and present
  4. getElementsByTagName– Check if there is an ID/name above it, then jump “down” to it first then use TagName)
  5. querySelector/querySelectorAll- Most flexible but is the most complicated- (If you know CSS, this is much not hard to adapt)

Q) What are Frequently used Get/Set/Trigger HTML Elements?

A) The below are the most frequently used however this is highly tied to what your goal is.

Methods:
.click() – Clicks the element
.fireEvent(“onclick”) – Fires the click event (sometimes needed to trigger the event)
.focus() – Place the cursor in the element/edit field (set cursor)
.selectedIndex() – Dropdown box selection
.checked() – Radio / Checkbox selection

Properties (Get & Set):
innerText – the plain-text for the element
.value – the plain-text for the element
.outerHTML – the HTML for the element

Additional Resources

AHK specific:

Learning the DOM (If you spend time on this you’ll thank us later!):

How to Web Scrape with AutoHotkey: Setting values and clicking links in IE

Setting values and clicking links in IE

Setting values and clicking links in IEThis is the second video in this series.  Here we practice setting values and clicking links in IE on a page (kind of reverse of Web Scraping with AutoHotkey however I don’t believe anybody has coined a decent term yet) and clicking links.

Word of warning- some pages want you to fire an “event”.  Sometimes this is tricky.  Given this video is set to an introductory level I only touch a little on the subject.

Web Scraping with AutoHotKey 102-Setting values and clicking links in IE

Setting values and clicking links in IE

The syntax for writing the writing the code can be found on my first post here.  There is also an AutoHotKey forum thread you might wish to review hereSetting values and clicking links in IE is pretty straight forward

Web scraping

 (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs

Web Scraping with AutoHotKey 104- Dealing with the dreaded Frames!

AutoHotkey Merchandise-White Stress ballWhen I’m web scraping info off the Internet and run into Frames I first cry and then say a prayer… The below video helps review the issue and offers some solutions on how to, reliably, grab data.

Just keep in mind that each frame is, in essence, it’s own page and you’ll have to add additional values to your handle to navigate to it.

A copy of the syntax writer for Web Scraping with AutoHotkey can be found here.

Troubleshooting Web Scraping frames with AutoHotkey

AutoHotkey Bottle 3

Web Scraping with AutoHotKey 103-Leveraging the Document Object Model

Web Scraping with AutoHotKey

Web Scraping with AutoHotKey-Leveraging the Document Object Model

This third video on Web Scraping gets a little advanced and shows how you can leverage the DOM to make extracting data from a webpage much easier and reliable.

Leveraging the Document Object Model (DOM)will take some practice (especially if you’re not familiar with Object oriented coding) but it is well worth it because it greatly reduces the amount of clean-up you have to do after you extract your data.  I used to write some pretty crazy regular expressions to try and clean up my code.  Once I learned how to better navigate the DOM it negated the need for cleaning!

The HTML Document Object Model (DOM)-Tree of Objects

Document Object Model

Video Web Scraping with AutoHotKey Leveraging the DOM plus looping over pages

The syntax for writing the writing the web scraping code can be found on my first post here.  There is also an AutoHotKey forum thread you might wish to review here.

  • Intro to AutoHotkey HotStrings with AutoHotkey Intermediate AutoHotkey GUIs are Easy with AutoHotkey Intro to DOS & AutoHotkey