If you want to do more complicated predictions involving lots of independent variables, you probably want to use cox regression and predict.coxph.

These resources should help:

http://daynebatten.com/2015/02/customer-churn-cox-regression/

https://cran.r-project.org/web/packages/survival/survival.pdf

You can choose to ignore the problem and use the model and just treat people that never got cancer as censored, but do so at your own risk.

There’s also models specifically designed for handling data where not everybody experiences the event of interest. Google “cure models” if you’re interested. Incorporating the math behind those models into this project would be a pretty big undertaking, though.

]]>Thanks for commenting. Every observation is marked as uncensored because every engine in this data set did fail eventually. Not every data set will be that way, obviously.

Thanks!

]]>As discussed above, the third and fourth values in the output for each observation are alpha and beta parameters of a weibull distribution describing the likelihood of engine failure over future time. You can use these parameters to calculate a distribution function or cumulative distribution function of likely failures for each engine. You’d then have to choose some statistic to help you decide how to prioritize which engines to focus on. Maybe it’s time until the CDF goes past 50%, for example… (in other words, time at which the engine is predicted to have a >50% chance of having failed). Ultimately, this is going to be up to the discretion of the researcher, and you’ll have to choose thresholds that give you the kind of precision/recall performance you want.

Thanks!

]]>