My previous series of guides on survival analysis and customer churn has become by far the most popular content on this blog, so I'm coming back around to introduce some more advanced techniques...
When you're using cox regression to model customer churn, you're often interested in the effects of variables that change throughout a customer's lifetime. For instance, you might be interested in knowing how many times that customer has contacted support, how many times they've logged in during the last 30 days, or what web browser(s) they use. If you have, say, 3 years of historical customer data and you set up a cox regression on that data using covariate values that are applicable to customers right now, you'll essentially be regressing customer's churn hazards from months or years ago on their current characteristics. Your model will be allowing the future to predict the past. Not terribly defensible.
In the classic double-slit experiment, past events are seemingly affected by current conditions. But unless you're a quantum physicist or Marty McFly, you're probably not going to see causality working this way.
In this post, we'll walk through how to set up a cox regression using "time-dependent covariates," which will allow us to model historical hazard rates on variables whose values were applicable at the time.
I recently downloaded the entire history of the Billboard Hot 100 chart, which has been tracking the most popular 100 US music tracks since 1958. So, I decided to see what I could learn!
Primary colors. Except not quite.
It only took me a couple of minutes to generate some interesting factoids. For instance, did you know…
That Rockin’ Around the Christmas Tree and Nat King Cole’s version of The Christmas Song both made it onto the charts for the first time in 1960… and both reappeared as recently as 2014?
Or that Imagine Dragon’s Radioactive spent more consecutive weeks on the chart than any song ever… at 85 weeks?
That the Beatles managed to have 14 songs on the Billboard Hot 100 at the same time in April of 1964? And that 5 of those songs managed to make the top 10 list simultaneously? Absolutely dominant.
How about that very few artists have managed to produce a #1 single, with no other charted songs… though the feat has been managed by Soulja Boy (with Crank That), Baauer (with Harlem Shake), and Daniel Powter (with Bad Day)? That’s the pinnacle of “one hit wonder” status right there.