TODAYDOUGLEARNED is a place where I can dump my brain, muse, complain, and otherwise contribute to the greater project of aggregating human knowledge in a world wide web.

Web Scraping with Tampermonkey

Written in

by

I ran into a scenario today where a client needed data scraped from a website, but the website offers no API, cURL, or fetch capabilities as far as I can tell. The data can be presented in a datatable, but this would require some scripting to run through multiple pages of results, each row being dumped into an output file. Alternatively, the website offers a CSV export button. I like the idea of having pre-formatted, clean data exported from the site on a schedule – the other half of the battle is figuring out how to move that data into a warehouse. But in this blog post I’ll cover the scraping portion, and perhaps I’ll write again when I figure out the rest.

As I was examining a variety of scraping options I ran into this post over at Stack Overflow: https://stackoverflow.com/questions/36045745/automate-daily-csv-file-download-from-website-button-click

The author pivoted from their initial attempts to what they considered to be a more elegant solution – using a Chrome extension called Tampermonkey to run a script on a particular web page, and automating this process with a .bat file. I figured I’d give it a try.

Installing Tampermonkey was really easy. You visit their extension page in the Chrome Web Store, click the Add To Chrome button, and give the extension some permissions – poof, you’re in business. Once the extension has been added, you can hover over it and click on the “Create a new script…” option to get started. The interface is simple – they provide a template containing some comments/options you can configure and then a Javascript editor where you can drop in your code.

ZsOdP

You can even run their built-in syntax checker to ensure that you don’t have any particularly stupid errors. My one big complaint about Tampermonkey is that there isn’t an error log built into it, but it’s not a big deal to just run Chrome’s developer tools while you’re building and testing to track any potential bugs.

In a nutshell, when you set a URL value in the @match field, Tampermonkey will execute your script every time Chrome opens that address. This works great for me – the report I want to scrape is always at a particular URL. I also needed to set some start and end dates before clicking a button, and of course Javascript provides some simple DOM methods (e.g. document.getElementById(), document.querySelector(), etc.) to manipulate input fields. Seven lines of code and I’m scraping the report every time my browser opens the URL.

To fully automate this process, I set up a .bat file that will run once a day:

start chrome "https://youraddresshere.com"
timeout 10
move "C:\path\to\move\from.csv" "C:\path\to\move\to.csv"
taskkill /F /IM chrome.exe /T > nul

Piece of cake! The .bat file opens Chrome, directs it to the page where Tampermonkey will run, and then moves the downloaded file from my Downloads folder to a predetermined location. Then Chrome’s task gets killed, eliminating any warning messages.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: