Counting is hard. You might be surprised to hear me say that, but it's true. As a data scientist, I've done it all - everything from simple regression analysis all the way to coding Hadoop MapReduce jobs that process hundreds of billions of data points each month. And, with all that experience, I've found that counting often involves far more time and effort.
He makes it look so easy!
So why do I think counting is so hard? Let me give you three reasons.
If we’re honest, I imagine most of us would admit we don’t really know what a good price is on the grocery items we purchase regularly. Except for a few high-priced favorites (e.g., ribeyes and salmon) that I watch for sales, I honestly have no idea what’s a regular price and what’s a good deal. How much does a box of raisin bran cost? Whatever the grocery store charges me…
A local Harris Teeter in my area. Usually, their prices are higher… except when they’re not. Honestly, I don’t really know.
Obviously, this is a really terrible way to manage my grocery budget – I probably spring for “deals” that are nothing of the sort all the time. So, as a data scientist, I got to thinking… what if I could keep tabs on this? Build a database of historic prices for various items, so I’d know when to pull the trigger on sales. Seems straightforward enough.
Well, I needed some data to get started. So, I figured I’d see if I could programmatically scrape prices out of the online weekly ad for my local Kroger store. In this post I’ll walk through how I got that set up… and, as this project moves along, I’ll post updates on what I do with the data.
In this how-to video, the author merges customer data with Google Analytics data via Google BigQuery. Luckily, you can unlock these kinds of features without having to take out a second mortgage.
Think that sounds like a cool idea? Let’s get started.