If we’re honest, I imagine most of us would admit we don’t really know what a good price is on the grocery items we purchase regularly. Except for a few high-priced favorites (e.g., ribeyes and salmon) that I watch for sales, I honestly have no idea what’s a regular price and what’s a good deal. How much does a box of raisin bran cost? Whatever the grocery store charges me…
Obviously, this is a really terrible way to manage my grocery budget – I probably spring for “deals” that are nothing of the sort all the time. So, as a data scientist, I got to thinking… what if I could keep tabs on this? Build a database of historic prices for various items, so I’d know when to pull the trigger on sales. Seems straightforward enough.
Well, I needed some data to get started. So, I figured I’d see if I could programmatically scrape prices out of the online weekly ad for my local Kroger store. In this post I’ll walk through how I got that set up… and, as this project moves along, I’ll post updates on what I do with the data.
Exploring the data format
The first thing I did was to start using the Chrome developer tools to monitor HTTP traffic to the Kroger website when I loaded the weekly ad for my local store (Kroger is my go-to grocery store). Turns out, the weekly ad is stored at a URL that looks something like https://wklyads-krogermidatlantic.kroger.com/flyers/krogermidatlantic-weekly?type=2&store_code=00342&chrome=broadsheet&flyer_run_id=##### where 00342 is the store ID of my local Kroger and ##### is the ID for the current “run” (whatever that means) of fliers being distributed by Kroger.
Scripting it up in PHP
I know PHP isn’t as sexy as, say, Python right now, but I’m very familiar with it, and it actually makes a remarkably good scripting language. So, I coded things up! This is pretty self-explanatory and well-commented, so I won’t go into too much detail here… Basically, this code hits the main ad page for my local Kroger to get the list of ads, finds the ID for the ad run containing the word “weekly” in the name, queries for that data, grabs data from relevant fields, and writes it all out to a CSV.
Check it out!
This is probably only the first step in an ongoing project. Now that I can grab this data, I’ll need to set up a weekly cron job to pull it and pop it into a database somewhere. Then, after that goes on for a while, I’ll need to actually do something with the data!
Stay tuned to see where this goes!