Akankhya-Mohapatra / Statistical-Learning-with-R

Strategizing to maximize Customer Retention in Telecom Company

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Goal

      The research objective of our simulation project will be to perform a model comparison between a random forest model and a logistic regression model in the context of a binary classification problem. The simulation will be modelled after one conducted by Kirasich et al. (2018), in which a similar study was carried out. 
      We will differentiate our study through the addition of unique scenarios not looked at in their study - impact of missing values and study of two different missing values imputation - Random Forest Imputation and mode imputation. 
       To measure the performance of our models and to substantiate our research objectives, we will use the misclassification rate (accuracy) along with the AUC/ROC and Cumulative Lift curves to visualize the results of the simulation. We will additionally use the AUC/ROC curve to compare the sensitivity (true positive rate) of the competing model to see if either model performs better for this specific performance metric.

References

Kirasich, Kaitlin; Smith, Trace; and Sadler, Bivin (2018) "Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets," SMU Data Science Review: Vol. 1 : No. 3 , Article 9. Available at: https://scholar.smu.edu/datasciencereview/vol1/iss3/9

About

Strategizing to maximize Customer Retention in Telecom Company


Languages

Language:HTML 95.9%Language:R 4.1%