TODAYDOUGLEARNED is a place where I can dump my brain, muse, complain, and otherwise contribute to the greater project of aggregating human knowledge in a world wide web.

Using Excel to List and Count Files

Written in

by

On one of my recent web scraping projects, I got a follow-up request from a client to add some additional data to my results. The scraped data was dumped into individual folders in PDF format, often with multiple documents per folder. The client wanted to know if I could provide them with a count of the number of PDF files in each folder – and there were thousands of folders. It wasn’t going to happen manually. And I hate VBA, so I was hoping to find a quick and dirty formula-based solution.

A quick Duck Duck Go search gave me the answer I needed – using the now-deprecated Excel “Files” formula, I was able to pull it off.

Here’s a link to a really, really helpful rundown on setting up the Files formula in the latest versions of Excel.

I had to make a few modifications to suit my unique case.

The =COUNTA(listFiles) formula worked, but only for one directory. I needed to go through thousands of directories. So I modified the listFiles formula range to include more than one row. When defining the formula name, I entered the range as:

=FILES(Sheet1!$A$2:$A$1188)

Here’s what that looks like in Excel:

formula

My directory names were based on a value in each row, so it was easy to set up a formula that would build the directory name and swap in the correct value in each row. For example:

="C:\Users\doug\Documents\Main Folder\"&A2&"\*"

Where cell A2 contains the folder name. Drag that down and you’ve got a column of directories.

One final modification – in this case, I had a mixture of PDF files and other file types in each directory, but I only wanted to count PDF files. By modifying the directory name, my formulas would only count files ending in PDF:

="C:\Users\doug\Documents\Main Folder\"&A2&"\*pdf"

The * is a wildcard, so any characters can appear in the file name before the PDF extension.

Then all you have to do is put the =COUNTA(listFiles) formula in another column and boom, you’ve got a count of the number of PDF files in each directory.

The only drawback to this seems to be that when there are *no* PDF files in a directory, a value of “1” is still returned in the count formula. I’m not sure why that is, and frankly it didn’t matter a whole lot to us so I didn’t spend any time resolving that little oddity. But it’s good to know.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: