In a past post on analyzing churn in the subscription or Software as a Service business, I talked about two different ways to quantify the dollar cost of churn. You could use 1 / churn as an estimation of mean customer lifetime (though this simple method makes a lot of assumptions). Or, you could use “pseudo-observations” to calculate the dollar value of certain groups of customers during a particular time period (which doesn’t let you quantify the full lifetime value of a customer).
But what if there was another way? What if we took our Kaplan-Meier best estimate of our churn curve, fit a linear model to that model, and then projected it out?
Well, as it turns out, we’d get a reasonable estimation of our lifetime churn curve, which would let us estimate average customer lifetime, and customer lifetime value. Let’s get started.
Fitting the basic curve
I’ve blogged about creating Kaplan-Meier estimators of churn curves in the past, so I’m going to assume everybody is up on the details. Suffice it to say, you’re just creating a graph of the percentage of customers that are still subscribed to your service a given number of time periods after signing up. It looks something like this (with a 95% confidence interval included):
The code for fitting one of these curves (for a fictional guitar-tab subscription service called NetLixx) is shown below. (And you can download a csv of the raw data here.) We just use R’s survival package to fit a curve, then extract the mean value. Like so:
If you’re confused about how all this works, do be sure to read the earlier post. Otherwise, keep moving on for the meat of this post!
OK, so we have a survival curve that looks to be almost exactly a year long. But a good 80% of customers are still with the company. How can we know what mean customer lifetime is if most of the customers haven’t even churned yet? We’ll create a model of churn, then project it out into the future!
We’re just going to create a basic linear model, with one little complication: we’re going to model the logged value of our survival curve. There are a couple of good reasons for this:
- The value of our survival curve will never go below 0. You can’t have -1% of your customers remaining. If we can help it, we don’t want to project impossible values (even at the extremes).
- It makes sense that the distribution of survival times will have a positive skew. The shortest a customer can survive is 0 days. But, even if our service is bad, there’s gonna be somebody out there who won’t give up on us until somebody pries our service out of their cold, dead, fingers.
This is actually a ridiculously simple process. We create an X variable (which is just a representation of time), fit a model, and then build an equation using the coefficients from our fitted linear model. We’ll also plot the results. Like this…
Our projection leads to a graph that looks something like this. (This projection is for an extra 1000 days, but you could go as far as you wanted.) Not too shabby!
Of course the real magic here is that last line. If we integrate our projected survival curve from day 0 to day Infinity, we get our mean customer lifetime! In this case, the answer comes out to 1,391 days or around 45 months. If we multiply this by monthly revenue, we get a projected calculation of customer lifetime value.
Of course, this methodology makes a lot of assumptions. (Like, really. A lot of assumptions. Projecting beyond existing data is always dangerous territory.) But, in a situation where you know your average customer lifetime is longer than your oldest customer has been around, you’re not going to find a methodology that doesn’t make a lot of assumptions. Your best bet is to document your assumptions, give everything a gut-check, and go from there.
Let me know if you have any thoughts or suggested improvements!