Analyzing Customer Churn – Using “tmerge” to Build Data Sets

If you've ever done churn analysis using cox regression with time-dependent covariates, you know that the hardest part of doing that type of research is building your base data set. You have to divide each customer's lifetime into "chunks" where the changing values of a host of different predictor variables apply. I've coded this in SQL before, and it gets ugly. Fast.

Merge, everybody, merge!

Thankfully, R has a solution for this that makes life significantly more simple. It's the "tmerge" function, and it's in the survival package. Today, we'll walk through how to use it.

Continue reading

Analyzing Customer Churn – Time-Dependent Coefficients

There are numerous strategies for dealing with non-proportional hazards in cox regression analysis. You can stratify your data based on a problematic variable. You can chuck the cox model and create "pseudo-observations" to analyze the gains (or losses) in lifetime within a certain period associated with changes in a variable. If age is a problem (unlikely for customer churn, but it happens a lot in medical contexts), you can use age rather than time in the cohort as your time scale. The list goes on.

But this is statistics! We're supposed to be modeling things!

this is sparta

Statistics. Almost as cool as Sparta.

Well, as it turns out, it's actually possible to directly model how the effects of a variable change with time, so that you can not only handle the proportional hazards problem, but also get a reliable estimate of how hazard ratios for a given variable change with time. The way to do this is actually incredibly simple... we introduce an interaction term between the variable of interest and time. Let's get started!

Continue reading