3 Reasons Counting is the Hardest Thing in Data Science

Counting is hard. You might be surprised to hear me say that, but it's true. As a data scientist, I've done it all - everything from simple regression analysis all the way to coding Hadoop MapReduce jobs that process hundreds of billions of data points each month. And, with all that experience, I've found that counting often involves far more time and effort.

He makes it look so easy!

So why do I think counting is so hard? Let me give you three reasons.

Continue reading

Scraping Grocery Store Ads for Fun and Profit

If we’re honest, I imagine most of us would admit we don’t really know what a good price is on the grocery items we purchase regularly. Except for a few high-priced favorites (e.g., ribeyes and salmon) that I watch for sales, I honestly have no idea what’s a regular price and what’s a good deal. How much does a box of raisin bran cost? Whatever the grocery store charges me…

harris teeter

A local Harris Teeter in my area. Usually, their prices are higher… except when they’re not. Honestly, I don’t really know.

Obviously, this is a really terrible way to manage my grocery budget – I probably spring for “deals” that are nothing of the sort all the time. So, as a data scientist, I got to thinking… what if I could keep tabs on this? Build a database of historic prices for various items, so I’d know when to pull the trigger on sales. Seems straightforward enough.

Well, I needed some data to get started. So, I figured I’d see if I could programmatically scrape prices out of the online weekly ad for my local Kroger store. In this post I’ll walk through how I got that set up… and, as this project moves along, I’ll post updates on what I do with the data.

Continue reading

Export Raw Data from Google Analytics (the Free Way)

Today, we’re going to use a couple of lines of JavaScript code to get free access to raw data from Google Analytics. That’s a feature that’s usually only available in Google Analytics Premium, a product which will set you back a cool $150,000 a year.

In this how-to video, the author merges customer data with Google Analytics data via Google BigQuery. Luckily, you can unlock these kinds of features without having to take out a second mortgage.

Think that sounds like a cool idea? Let’s get started.

Quick update… would you be interested in a service that let you add a couple lines of JavaScript to your site and then get all your raw Google Analytics data from an API? I’m thinking of putting that together and I’d love to talk to you about it. Let me know your interest here: https://goo.gl/forms/FtGL8e0ejBUmQO1q1.

Continue reading