In last week’s post, we explored how to tag individual users and hits with unique identifiers in Google Analytics, so that an analyst could export raw data from the Google Analytics API for complex statistical analyses not possible in the GA interface. But there are undoubtedly some situations in which even that solution isn’t good enough – Google limits the number of metrics and dimensions you can download in a single query, for example. What do you do then?
Luckily, there’s a solution for this. We’ll just send Google Analytics data on a little detour from the user’s browser to our own web server, process it ourselves, and query to our hearts content!
Re-directing the data
However, here’s the code snippet that will get you started – it would need to be placed between the GA setup code, and the code that actually sends the GA pageview (or other hit).
All this code does is simply hijack the “send hit” function in the GA library. When GA calls “send hit,” the code goes ahead and sends the payload over to Google, but it adds an additional request to your own server, with the same payload. You will (obviously) need to edit the URL on line 8 to match the location of your server.
Processing the data
Now, let’s generate a simple PHP script to actually grab some of that sweet, sweet Google Analytics data that we just re-directed to our server. In this case, we’ll just grab the “Document Title” parameter (encoded on the query string as “dt”). You can view the full list of Google Analytics query string parameters here.
I created a brutally simple HTML document, titled it “This is a test document” and included the GA code, plus the custom code for re-directing the output to the above PHP script. And, after a single pageview, the “output.txt” document created by PHP contained “This is a test document.” Awesome!
Of course, if you were serious about processing GA data yourself, you’d want to be inserting it into a database or something like that. I’ll leave that work up to you, though. Now that the data is flowing to your server, the sky’s the limit! I will mention, however, that doing individual database inserts every single time somebody views a page on your server might overwhelm your server very quickly – especially on a high-traffic site. If you’re concerned about this, you might consider simply serving up a static file (sans PHP) in response to the GA request, logging the full URL in Apache’s logs, and rotating your logs regularly for batch processing and bulk inserting into your database. That’ll keep the load on your server to a minimum.
As with anything, there are certainly ways that this could be improved and built upon. However, this is hopefully a very bare-bones illustration of how to get access to some really useful information in a much more accessible format than usual. Hope somebody puts it to good use!