Since I'm out of town for a bit, I'm migrating over a few relevant posts from an old blog of mine that I'm planning to shut down. Enjoy!
Pass the Pigs is a simple yet addictive dice game that uses cute little plastic pigs as dice. If you've never played, the rules are very straightforward. On each turn, a player rolls two pigs. The pigs will land in different positions, which will determine how many points the player has in their hand for that turn. The player may then decide to "pass the pigs" to the next player. If they do this, all of the points in their hand will be added to their official score. They may also decide to roll the pigs again to try to add more points to their hand before passing the pigs. But they must be careful! If the pigs both land on their sides with one showing a dot and the other showing a blank side, they "pig out" and lose all of the points they've accumulated in their hand! It's risky business. The first player to accumulate a score of 100 or higher wins.
Credit: Larry Moore
Like anything with dice (even pig-shaped dice), Pass the Pigs is a game of chance. That means, with a little effort, we should be able to figure out the probabilities of certain things happening in the game, and develop some optimal strategies. So how do you win at Pass the Pigs? Read on to find out.
Pretty much any language commonly used for data analysis (R, SAS, Python) can calculate the distance between two geographic coordinates with relative ease. But always having to pull your data out of your data warehouse any time you want to do some basic geographic analysis can be frustrating - sometimes it's nice to keep simple queries all in one system. If you've got a spatially enabled version of Postgres or SQL Server, you're in business. But if not, you'll need to roll your own SQL solution.
Because the earth is a sphere, the quickest route between two points is a "Great Circle," which may appear curved on flat maps...
In today's post, we're going to write our own code in vanilla SQL to calculate the distance between two latitude and longitude coordinates.
If you want to take a bunch of GIS data and rasterize it as a tiled image map for public consumption, the folks at ESRI would be happy to sell you an expensive solution. Of course, as with oh-so-many projects, you can accomplish the same thing for free with open-source software. In this case, we'll use Python and a library called Mapnik to render beautiful map layers, then display them on Google Maps, just like this demo rendering of my home county!
Ready to get started? Dust off your Python skills, and let's go!
In a past post on analyzing churn in the subscription or Software as a Service business, I talked about two different ways to quantify the dollar cost of churn. You could use 1 / churn as an estimation of mean customer lifetime (though this simple method makes a lot of assumptions). Or, you could use “pseudo-observations” to calculate the dollar value of certain groups of customers during a particular time period (which doesn’t let you quantify the full lifetime value of a customer).
But what if there was another way? What if we took our Kaplan-Meier best estimate of our churn curve, fit a linear model to that model, and then projected it out?
A model within a model, if you will. Churnception.
Well, as it turns out, we’d get a reasonable estimation of our lifetime churn curve, which would let us estimate average customer lifetime, and customer lifetime value. Let’s get started.
One of the best ways to learn how a statistical model really works is to code the underlying math for it yourself. Today, we’re going to do that with simple linear regression.
In the book Data Smart, John Foreman introduces a bunch of awesome methodologies by walking you through how to build them in Excel…
Of course, doing regression in SQL also has (some) practical use as well! For example, suppose you wanted to identify which city in a database of temperature records had the biggest warming trend in the last month. This method would send you on your way without having to bring your data into an external tool. Nifty!
In last week’s post, we explored how to tag individual users and hits with unique identifiers in Google Analytics, so that an analyst could export raw data from the Google Analytics API for complex statistical analyses not possible in the GA interface. But there are undoubtedly some situations in which even that solution isn’t good enough – Google limits the number of metrics and dimensions you can download in a single query, for example. What do you do then?
Luckily, there’s a solution for this. We’ll just send Google Analytics data on a little detour from the user’s browser to our own web server, process it ourselves, and query to our hearts content!
The 1945 movie, Detour, starring Tom Neal and Ann Savage.
In this how-to video, the author merges customer data with Google Analytics data via Google BigQuery. Luckily, you can unlock these kinds of features without having to take out a second mortgage.
Think that sounds like a cool idea? Let’s get started.
Recently, I found myself wanting to be able to make real-time, online predictions using a random forest classifier trained in R. Of course, there are many ways to make that happen – I could have used yhat’s ScienceOps product, for example. But, for project-specific reasons, I decided that the best route to go in this case was to get my hands dirty and build my own RESTful API for making predictions using my model.
Apparently, back in 2011, Disney debuted a show called “So Random.” Thankfully, it only ran for a single season…
In this post, we’ll walk through all of the code necessary to export a random forest classifier from R and use it to make real-time online predictions in a PHP script.
Have you ever taken a look at the “probability of outperforming” metric in Google Analytics’ Content Experiments and wondered how it was calculated? Have you ever scratched your head because the numbers didn’t make sense to you? I certainly have. It’s hard to see experiment results like the ones depicted below and not wonder what’s going on underneath the hood.
Real data from a GA content experiment, showing an under-performing variant with a >50% chance of outperforming the original. It’s like trash-talking when you’re down at the half.
In this post, we’ll highlight how Google’s Content Experiments work, why it’s a really smart idea, and why you might still want to do a little bit of the heavy lifting yourself…
What if your business was paying a bunch of extra money to bring in sales that would have happened anyway?
In the e-commerce business, affiliate marketing promises to deliver increased sales by getting your name and products showing up on dozens of sites, blogs, and social media pages. Of course, this sounds like a great boon – more traffic, more sales, more profits. But, in many cases, the results aren’t nearly as good as you might expect. If you’re not careful, your affiliate program can cannibalize sales that were going to happen anyway…