Send Google Analytics Data to Your Own Server

In last week’s post, we explored how to tag individual users and hits with unique identifiers in Google Analytics, so that an analyst could export raw data from the Google Analytics API for complex statistical analyses not possible in the GA interface. But there are undoubtedly some situations in which even that solution isn’t good enough – Google limits the number of metrics and dimensions you can download in a single query, for example. What do you do then?

Luckily, there’s a solution for this. We’ll just send Google Analytics data on a little detour from the user’s browser to our own web server, process it ourselves, and query to our hearts content!

The 1945 movie, Detour, starring Tom Neal and Ann Savage.

The methodology I’ll summarize today allows an organization to leverage much of the value-add of Google Analytics (for instance, they’ve already done all the hard work of detecting JavaScript, flash, screen size, page, URL, etc.) while still processing the data on their own servers. It’s a massive win-win.

Continue reading

Export Raw Data from Google Analytics (the Free Way)

Today, we’re going to use a couple of lines of JavaScript code to get free access to raw data from Google Analytics. That’s a feature that’s usually only available in Google Analytics Premium, a product which will set you back a cool $150,000 a year.

In this how-to video, the author merges customer data with Google Analytics data via Google BigQuery. Luckily, you can unlock these kinds of features without having to take out a second mortgage.

Think that sounds like a cool idea? Let’s get started.

Continue reading

Random Forest Classifiers as a Web Service in PHP

Recently, I found myself wanting to be able to make real-time, online predictions using a random forest classifier trained in R. Of course, there are many ways to make that happen – I could have used yhat’s ScienceOps product, for example. But, for project-specific reasons, I decided that the best route to go in this case was to get my hands dirty and build my own RESTful API for making predictions using my model.

Apparently, back in 2011, Disney debuted a show called "So Random." Thankfully, it only ran for a single season...

Apparently, back in 2011, Disney debuted a show called “So Random.” Thankfully, it only ran for a single season…

In this post, we’ll walk through all of the code necessary to export a random forest classifier from R and use it to make real-time online predictions in a PHP script.

Continue reading

Probabilities in Google Analytics Content Experiments

Have you ever taken a look at the “probability of outperforming” metric in Google Analytics’ Content Experiments and wondered how it was calculated? Have you ever scratched your head because the numbers didn’t make sense to you? I certainly have. It’s hard to see experiment results like the ones depicted below and not wonder what’s going on underneath the hood.

GA Experiment Data

Real data from a GA content experiment, showing an under-performing variant with a >50% chance of outperforming the original. It’s like trash-talking when you’re down at the half.

In this post, we’ll highlight how Google’s Content Experiments work, why it’s a really smart idea, and why you might still want to do a little bit of the heavy lifting yourself…

Continue reading

Are Your Affiliate Partners Cannibalizing Organic Sales?

What if your business was paying a bunch of extra money to bring in sales that would have happened anyway?

In the e-commerce business, affiliate marketing promises to deliver increased sales by getting your name and products showing up on dozens of sites, blogs, and social media pages. Of course, this sounds like a great boon – more traffic, more sales, more profits. But, in many cases, the results aren’t nearly as good as you might expect. If you’re not careful, your affiliate program can cannibalize sales that were going to happen anyway…

Cutco knives seem like the 1990's equivalent of affiliate marketing...Image Credit: Hustvedt  [CC BY-SA 3.0 or GFDL], on Wikimedia Commons)

Cutco knives – affiliate marketing’s evil twin brother.
(Image Credit: Hustvedt [CC BY-SA 3.0 or GFDL], on Wikimedia Commons)

In today’s post, I’ll show you a simple technique to figure out how cannibalistic your affiliate program is, using a specially-designed Google Analytics segment.

Continue reading

Cell Suppression in SAS – Final Thoughts

Over the last several weeks, I’ve blogged about two different methods for solving the small cell suppression problem using SAS Macro code. In the first, we used a heuristic approach to find a solution that was workable but not necessarily optimal. In the second, we solved the problem to proven optimality with SAS PROC OPTMODEL. But all of this leaves a few open questions…

For example, how much better is the optimal approach than the heuristic? Is there ever a reason not to prefer the optimal approach? And what are some other improvements and techniques that a researcher using these macros might want to know about? I’ll spend this post reflecting on our two solutions and covering a few of these bases.

Continue reading

Optimal Cell Suppression in SAS – Final Macro

In last week’s post, we constructed a set of constraints to bound a binary integer program for solving the small cell suppression problem. These constraints allow us to ensure that every group of data points which could be aggregated across in a tabular report contains either 0 or 2+ suppressed cells.

Cop-out test answer.

At some point before age five, every kid masters the art of satisfying constraints with solutions that are hilariously non-optimal.

Obviously, there’s plenty of ways we could satisfy our constraints – suppressing everything, for example. But we want choose the optimal pattern of secondarily suppressed cells to minimize data loss. So, we’re going to tackle the problem using binary integer programming in PROC OPTMODEL. Strap yourself in, folks – it’s going to be an exciting ride.

Continue reading

Optimal Cell Suppression in SAS – Building Constraints

In last week’s post we built a SAS macro that found acceptable solutions to the small cell suppression problem using a simple heuristic approach. But what if acceptable isn’t good enough? What if you want perfection? Well, then, you’re in luck!

Ben Franklin

Benjamin Franklin once attempted to become morally perfect. Too bad he didn’t have PROC OPTMODEL…

I’ve blogged previously about optimization with linear programming in SAS PROC OPTMODEL, and it turns out that the cell suppression problem is another class of problems that can be tackled using this approach. (If you’re unfamiliar with linear programming, check out the linear programming Wikipedia article to get up to speed.) Over the next two posts, we’ll be setting up a SAS Macro that builds the constraints necessary to bound our optimization problem, then implementing the actual optimization code in PROC OPTMODEL.

Continue reading

A Heuristic Cell Suppression Approach in SAS Macro Code

Often, complex problems can be adequately solved by simple rules that provide an acceptable solution, even if they don’t necessarily get you to the optimal point. The cell suppression problem (summarized in last week’s post) is a perfect example of this – using a methodology that would be readily apparent to any human faced with tackling the problem with pen and paper, we can create a computerized solution that can appropriately suppress data sets containing tens of thousands of records disaggregated over dozens of dimensions. This heuristic method will likely suppress more data than it really needs to, but when all is said and done, it will finish the job quickly and without completely mangling your statistics.

XKCD Fermi Estimation

Heuristics are kind of like Fermi estimation. Or, more accurately, I needed an image for this post and this was the best thing I could come up with.
Image credit: XKCD

We’ll start with an explanation of the basic idea, then move on to implementing it in code.

Continue reading

Small Cell Suppression – Problem Overview

The “cell suppression problem” is one type of “statistical disclosure control” in which a researcher must hide certain values in tabular reports in order to protect sensitive personal (or otherwise protected) information. For instance, suppose Wayout County, Alaska has only one resident with a PhD – we’ll call her “Jane.” Some economist comes in to do a study of the value of higher education in rural areas, and publishes a list of average salaries disaggregated by county and level of education. Whoops! The average salary for people with PhDs in Wayout County is just Jane’s salary. That researcher has just disclosed Jane’s personal information to the world, and anybody that happens to know her now knows how much money she makes. “Suppressing” or hiding the value of that cell in the report table would have saved a lot of trouble!

No, not that kind of suppression.

No, not that kind of suppression.

Over the next couple weeks, I’ll be blogging about some algorithms used to solve the cell suppression problem, and showing how to implement them in code. For now, we’re going to start with an introduction to the intricacies of the problem.

Continue reading