I just posted a simple implementation of WTTE-RNNs in Keras on GitHub: Keras Weibull Time-to-event Recurrent Neural Networks. I'll let you read up on the details in the linked information, but suffice it to say that this is a specific type of neural net that handles time-to-event prediction in a super intuitive way. If you're thinking of building a model to predict (rather than understand) churn, I'd definitely consider giving this a shot. And with Keras, implementing the model is pretty darn easy.
As proof of the model's effectiveness, here's my demo model (with absolutely no optimization) predicting the remaining useful life of jet engines. It's not perfect by any means, but it's definitely giving a pass to engines that are in the clear, and flagging ones that are more likely to fail (plus a few false positives). I have no idea how much better it could do with some tweaking:
Also, if anybody's curious as to why I've been in a bit of a post desert lately, my wife and I recently had a baby and I haven't been giving as much thought to the blog. However, I have some ideas brewing!
Every survival analysis method I've talked about so far in this series has had one thing in common: we've only looked at one event in a customer lifetime (churn). In many cases, that's a perfectly fine way to go about things... we want our customers to stick with us, so churn is the event of interest. So why would we ever need to think about competing risks?
You know, competing risks. Will you die by tornado, or by shark?
There's actually a critical assumption undergirding most survival analysis methods for right-censored data - that censored individuals have the same likelihood of experiencing the event of interest as individuals that never got censored. If this assumption ever gets violated, things like Kaplan-Meier estimators can become wildly inaccurate. (If you need a refresher on Kaplan-Meier curves and other concepts, take a look at my earlier post on basic survival analysis.)
If you've ever done churn analysis using cox regression with time-dependent covariates, you know that the hardest part of doing that type of research is building your base data set. You have to divide each customer's lifetime into "chunks" where the changing values of a host of different predictor variables apply. I've coded this in SQL before, and it gets ugly. Fast.
Merge, everybody, merge!
Thankfully, R has a solution for this that makes life significantly more simple. It's the "tmerge" function, and it's in the survival package. Today, we'll walk through how to use it.
There are numerous strategies for dealing with non-proportional hazards in cox regression analysis. You can stratify your data based on a problematic variable. You can chuck the cox model and create "pseudo-observations" to analyze the gains (or losses) in lifetime within a certain period associated with changes in a variable. If age is a problem (unlikely for customer churn, but it happens a lot in medical contexts), you can use age rather than time in the cohort as your time scale. The list goes on.
But this is statistics! We're supposed to be modeling things!
Statistics. Almost as cool as Sparta.
Well, as it turns out, it's actually possible to directly model how the effects of a variable change with time, so that you can not only handle the proportional hazards problem, but also get a reliable estimate of how hazard ratios for a given variable change with time. The way to do this is actually incredibly simple... we introduce an interaction term between the variable of interest and time. Let's get started!
My previous series of guides on survival analysis and customer churn has become by far the most popular content on this blog, so I'm coming back around to introduce some more advanced techniques...
When you're using cox regression to model customer churn, you're often interested in the effects of variables that change throughout a customer's lifetime. For instance, you might be interested in knowing how many times that customer has contacted support, how many times they've logged in during the last 30 days, or what web browser(s) they use. If you have, say, 3 years of historical customer data and you set up a cox regression on that data using covariate values that are applicable to customers right now, you'll essentially be regressing customer's churn hazards from months or years ago on their current characteristics. Your model will be allowing the future to predict the past. Not terribly defensible.
In the classic double-slit experiment, past events are seemingly affected by current conditions. But unless you're a quantum physicist or Marty McFly, you're probably not going to see causality working this way.
In this post, we'll walk through how to set up a cox regression using "time-dependent covariates," which will allow us to model historical hazard rates on variables whose values were applicable at the time.
In a past post on analyzing churn in the subscription or Software as a Service business, I talked about two different ways to quantify the dollar cost of churn. You could use 1 / churn as an estimation of mean customer lifetime (though this simple method makes a lot of assumptions). Or, you could use “pseudo-observations” to calculate the dollar value of certain groups of customers during a particular time period (which doesn’t let you quantify the full lifetime value of a customer).
But what if there was another way? What if we took our Kaplan-Meier best estimate of our churn curve, fit a linear model to that model, and then projected it out?
A model within a model, if you will. Churnception.
Well, as it turns out, we’d get a reasonable estimation of our lifetime churn curve, which would let us estimate average customer lifetime, and customer lifetime value. Let’s get started.
Sometimes, taking action to reduce customer churn costs money. In those instances, it can be helpful to know how much revenue churn is costing you… and how much of it you could recapture. Lucky for us, there’s a stat for that! It’s called “Restricted Mean Survival Time,” and it allows us to easily quantify the monetary impact of changes in customer churn. Let’s think about putting it to use!
Last week, we discussed using Kaplan-Meier estimators, survival curves, and the log-rank test to start analyzing customer churn data. We plotted survival curves for a customer base, then bifurcated them by gender, and confirmed that the difference between the gender curves was statistically significant. Of course, these methods can only get us so far… What if you want to use multiple variables to predict churn? Will you create 5, 12, 80 survival curves and try to spot differences between them? Or, what if you want to predict churn based on a continuous variable like age? Surely you wouldn’t want to create a curve for each individual age. That’s preposterous!
Don’t do this. This isn’t useful analysis. This is an etch-a-sketch gone horribly wrong.
Luckily, statisticians (once again, primarily in the medical and engineering fields) are way ahead of us here. A technique called cox regression lets us do everything we just mentioned in a statistically accurate and user-friendly fashion. In fact, because the technique is powerful, rigorous, and easy to interpret, cox regression has largely become the “gold standard” for statistical survival analysis. Sound cool? Let’s get started.
If your company operates on any type of Software as a Service or subscription model, you understand the importance of customer churn to your bottom line. When a customer leaves, you lose not only a recurring source of revenue, but also the marketing dollars you paid out to bring them in. As such, small changes in customer churn can easily bankrupt a profitable business, or turn a slow-mover into a powerhouse.
If you’re ready to get a handle on customer churn in your business, you’re ready to start doing some survival analysis. These statistical methods, which have been applied for decades in medicine and engineering, come in handy any time you’re interested in understanding how long something (customers, patients, car parts) survives and what actions can help it survive longer.
And the best part? The methods involved are mathematically simple, easy to understand and interpret, and widely available in free tools. You don’t need a PhD in stats to do this!