Analyzing Customer Churn – Restricted Mean Survival Time

What if you could spend \$50 per customer to reduce churn in your business by 1 percentage point. Would you do it? Would it make financial sense? Or would you just be burning money?

This is what I think of any time somebody says “burning money.” Either this, or that Career Builder commercial with the monkeys.

Sometimes, taking action to reduce customer churn costs money. In those instances, it can be helpful to know how much revenue churn is costing you… and how much of it you could recapture. Lucky for us, there’s a stat for that! It’s called “Restricted Mean Survival Time,” and it allows us to easily quantify the monetary impact of changes in customer churn. Let’s think about putting it to use!

Restricted Mean Survival Time – The Basic Idea

As its name suggests, Restricted Mean Survival Time (RMST from here on out) is simply the average number of time periods a customer survives before churning… except that the highest values are “restricted” to some maximum. So, we might take an average survival time in days for a group of customers, but we restrict the highest values to 365 before we take the average. That’s the 365-day RMST for that group.

So what does it tell us, exactly? It tells us the average number of days of revenue we’ll get out of a group of customers during their first year. If the RMST comes out to, say, 335, we know that we’ll get 335 days (or 11 months) of revenue out of the average customer. If our monthly fee is \$5 / month, that’s \$55 of revenue per customer in their first year. Framed differently, we can say that churn is costing us \$5 per customer out of a possible \$60 in first-year revenue.

Of course, you could do this with other maximum time periods… a month, two years. Whatever makes sense for your analysis. But do make sure it makes sense. Calculating a 10-year RMST when you only have 1 year of customer data would be fruitless.

Restricted Mean Survival Time With Censored Data

Of course, as many of you have likely already realized, there’s a problem. If you’ve been following along with this series, you know that survival data is usually “right censored.” This simply means some of our customers haven’t been with us for as much time as we’re interested in analyzing. If Jim’s been with us for 38 days and hasn’t churned, what survival time value do we use for Jim when we take the average? There’s no way of knowing.

Well, as it happens, there’s another way to calculate RMST. As we discussed previously, we can estimate the percentage of customers surviving at any given time interval using “kaplan-meier estimators,” then plot these results as a survival curve. The kaplan-meier estimators take care of the right censoring problems for us. The resulting plot looks like this…

To calculate RMST on right-censored data, we can simply calculate the area under the estimated survival curve during the time period we’re interested in analyzing. For calculus-minded folks, this likely already makes perfect sense. But it’s not too hard to see, even for those of us that aren’t mathematically-inclined.

Let’s think about an imaginary survival curve involving 100 customers. Suppose 100% of customers survive days 1-12. That means we get 1 day of revenue out of the average customer for each of the first 12 days. 1 * 12 = 12.

On day 13, 2 customers churn, so we have 98% of customers surviving on day 13. (How unlucky.) We therefore get .98 days of revenue out of the average customer on day 13. So, we can say our 13-day RMST is 12 + .98 = 12.98. That’s 12.98 days of revenue out of the average customer in the first 13 days. Extend this logic for an entire year, and you get the area under a 365-day survival curve… which is also your 365-day RMST.

Calculating and Using Restricted Mean Survival Time

Of course, as with any statistic, the real benefit lies in the application. So, let’s do some applying! As we’ve been doing for this whole series, we’ll use some dummy data from NetLixx, a fictional online guitar-tab subscription service.

The data we’ll use for this post involves a completely plausible business scenario. NetLixx has a long-standing technical glitch that forces users to re-install the app. Users that experience the glitch are leaving in droves, and the company has started offering \$5 service credits to those that hit the bug in an effort to keep their business. But is it working? Is the service credit program worth the money?

Using some dummy data and RMST, we’ll find out. The data is pretty simple… it has a traditional yes/no churn indicator, a survival time variable (churn date minus glitch date for those that churned, today’s date minus glitch date for those that didn’t), and a yes/no indicator showing whether or not the user was offered a \$5 bill credit. You can download the CSV, or take a look at the preview below.

Just like the other survival analysis tasks we’ve been doing, calculating the RMST for each group is pretty simple using R and the Survival package. In this case, we’ll do a 180-day RMST to see if we make our money back in the first six months. Here’s the code to make it happen.

And here’s the results.

As you can see, the RMST for the group that got the free credits is 163 days, 40 days greater than the group that didn’t. If we divide 40 by 30 to put it in terms of months, then multiply by \$5 of revenue, we know that we can expect to get \$6.67 in extra revenue out of a customer that got a credit vs. one that didn’t in the sixth months after their glitch. This implies that a \$5 credit is likely well worth the cost.

Some Perspective

Of course, RMST can only give us the revenue differences for a certain period of time. In the case of our example above, we only know about changes in revenue over a 6-month period – but the differences in revenue would probably be even greater in, say, months 7 to 12. We’ve got useful information, but we can’t use RMST to project out past the end of our data. Ultimately, as with anything, RMST is not a perfect tool… but it’s still good to have in the toolbox.

If you’re in a situation where you do need to project revenue past the end of your data, you can always fall back on a back-of-the-envelope figure. The average revenue you’ll get out of a group of customers for their entire life is their revenue per time period divided by their churn rate. If you get \$5 monthly revenue, and churn rate is 10%, average lifetime revenue per customer is \$5 / .1 = \$50. Of course, this assumes that the churn rate is stable over time, and it’s entirely possible it won’t be. The back-of-the-envelope method does better at projecting beyond the data, but RMST is better at dealing with churn rates that change over time. Again, tools in a toolbox. Use what makes sense.

Conclusion

In the end, RMST can be a great tool for understanding customer churn. It lets you quantify the financial impact of churn, and make smart business decisions about whether you should spend money to reduce your churn rate. Of course, as with anything, it’s not perfect.

Stick around ‘til next week for the last post in the series. We’ll be looking at a methodology that allows us to combine some of the benefits of cox regression (multiple independent variables) with some of the benefits of RMST (revenue implications, and no proportional hazards assumption). It will be yet another extremely useful tool.