My company has a subscription-based business model, which means we spend a lot of time analyzing customer churn. We wanted to include Kaplan-Meier survival curves in some of our executive dashboards, but neither our database (Redshift) nor any of our commonly used dashboarding tools (Tableau, Periscope, etc.) provided the necessary functionality. We could, of course, have pulled data out of the warehouse, analyzed it in R or Python, and pushed it back up, but that's pretty complicated. So we went looking for a better solution.
Based on all available evidence, the survival curve for aliens singing disco music hits 0 at about 43 seconds (N = 1).
As you likely guessed from the title of this post, that better solution involved writing our own code for calculating a Kaplan-Meier estimator in SQL. In this post, I'll be walking through our strategy step-by-step, including the SQL code for calculating the estimators and making that code reusable in Periscope. Let's do this!
In a past post on analyzing churn in the subscription or Software as a Service business, I talked about two different ways to quantify the dollar cost of churn. You could use 1 / churn as an estimation of mean customer lifetime (though this simple method makes a lot of assumptions). Or, you could use “pseudo-observations” to calculate the dollar value of certain groups of customers during a particular time period (which doesn’t let you quantify the full lifetime value of a customer).
But what if there was another way? What if we took our Kaplan-Meier best estimate of our churn curve, fit a linear model to that model, and then projected it out?
A model within a model, if you will. Churnception.
Well, as it turns out, we’d get a reasonable estimation of our lifetime churn curve, which would let us estimate average customer lifetime, and customer lifetime value. Let’s get started.