Last week, we discussed using Kaplan-Meier estimators, survival curves, and the log-rank test to start analyzing customer churn data. We plotted survival curves for a customer base, then bifurcated them by gender, and confirmed that the difference between the gender curves was statistically significant. Of course, these methods can only get us so far… What if you want to use multiple variables to predict churn? Will you create 5, 12, 80 survival curves and try to spot differences between them? Or, what if you want to predict churn based on a continuous variable like age? Surely you wouldn’t want to create a curve for each individual age. That’s preposterous!

Don’t do this. This isn’t useful analysis. This is an etch-a-sketch gone horribly wrong.

Luckily, statisticians (once again, primarily in the medical and engineering fields) are way ahead of us here. A technique called cox regression lets us do everything we just mentioned in a statistically accurate and user-friendly fashion. In fact, because the technique is powerful, rigorous, and easy to interpret, cox regression has largely become the “gold standard” for statistical survival analysis. Sound cool? Let’s get started.

Continue reading →