Video Hour 1: High Level: Getting text from programs
General ways (there are more) to get text & the order to try them:
- Using COM to extract text from programs like: Excel, Word, PowerPoint, Outlook, IE
- Getting Text from PDFs
- Using Active Accessibility Viewer & UI Automation
- ControlGet
- Copy / Paste (when all else fails or when just need a quick solution)
- OCR – Optical Character Recognition
Video Hour 2: Coding and Q&A
Script Highlight: Chrome.ahk by GeekDude
- Automate Chrome without Selenium
- Not as robust as COM w/IE but a great start at basic automation
- Connect to running Chrome (running in Debug mode)
- Page navigation
- Get / Set elements & JavaScript Injection
- Print pages to PDF
- Capture to Screenshot
- Some tutorial videos
Resources for getting text from programs
COM– Microsoft’s Component Object Model
- Past webinars demonstrating overall concepts on Excel, IE /Web Scraping & Outlook
- Lots of tutorials on Excel, Web Scraping & Outlook
- Here are specific examples for Excel, IE, Outlook (email) ,Word, and PowerPoint
Extracting from PDFs
- MS Word –Open pdf with word (requires 2013+) and getting text from programs
- SnapOCR – API for extracting text from PDFs
- ArchExch COM Object from Adobe (Must have full license version)
- PDFtoText
- Capture2Text
- Ghostscript
Active Accessibility Viewer & UI Automation
Active Accessibility & UI Automation are built-in Microsoft architecture that can allow you to, programmatically, get text from programs
- In these two webinars we demonstrated how possibilities & coding
Let’s see a few examples using:
- UI Automation via Jethrow
- Screen Reader – UI Automation by nepter
ControlGet
On standard Windows controls, you can access them with ControlGet
- Some things like listboxes, listviews, etc. have some pretty cool capabilities
- Often newer programs do not use the standard window controls 🙁
OCR – Optical Character Recognition
- Vis2 –OCR by isahound
- Streamlined version of Tesseract – Jackie Sztuk