I ran into a scenario today where a client needed data scraped from a website, but the website offers no API, cURL, or fetch capabilities as far as I can tell. The data can be presented in a datatable, but this would require some scripting to run through multiple pages of results, each row being dumped into an output file. Alternatively, the website offers a CSV export button. I like the idea of having pre-formatted, clean data exported from the site on a schedule – the other half of the battle is figuring out how to move that data into a warehouse. But in this blog post I’ll cover the scraping portion, and perhaps I’ll write again when I figure out the rest.
As I was examining a variety of scraping options I ran into this post over at Stack Overflow: https://stackoverflow.com/questions/36045745/automate-daily-csv-file-download-from-website-button-click
The author pivoted from their initial attempts to what they considered to be a more elegant solution – using a Chrome extension called Tampermonkey to run a script on a particular web page, and automating this process with a .bat file. I figured I’d give it a try.
You can even run their built-in syntax checker to ensure that you don’t have any particularly stupid errors. My one big complaint about Tampermonkey is that there isn’t an error log built into it, but it’s not a big deal to just run Chrome’s developer tools while you’re building and testing to track any potential bugs.
To fully automate this process, I set up a .bat file that will run once a day:
start chrome "https://youraddresshere.com" timeout 10 move "C:\path\to\move\from.csv" "C:\path\to\move\to.csv" taskkill /F /IM chrome.exe /T > nul
Piece of cake! The .bat file opens Chrome, directs it to the page where Tampermonkey will run, and then moves the downloaded file from my Downloads folder to a predetermined location. Then Chrome’s task gets killed, eliminating any warning messages.
Leave a Reply