all the customers of one of my clients.
The churn prediction model that I have currently created is giving some good results.
The client (and I) now wants to include Customer Satisfaction survey results into the the model to augment the predictive capability of the model.
The issue with using the survey data is that responses are only available for the customers who responded to the survey. Right now, I have this for 5000 out of the total 23,000 customers.
I definitely can't impute data in this case because fill rate is only around 5/23. Those features would be mostly NA when I try to predict it for the whole customer base.
How can I use the survey results effectively?
Bottomline is, how can I use a feature which is only available in 22% of the dataset?
I think what you should be most concerned about is class imbalance which you can use balanced sampling to cope with this. Otherwise you either collect more data or just train on the balanced sample
Обсуждают сегодня