Here HR Analytics dataset by Giri Pujar is used to create a classifier if a HR
will stay or leave.
Using the unbalanced dataset
of employees of the company to predict which employee might stay or leave the company. SMOT
is used to deal with the unbalanced dataset. SMOTE
(synthetic minority oversampling technique) is one of the most commonly used oversampling
methods to solve the imbalance problem.
Also recursive feature elimination
and feature elimination
techniques are used for feature engineering
.
The notebook is available on Kaggle to work in the same environment where this notebook was created i.e. use the same version packages used, etc...
Count plot to visualize how much our data is imbalanced
Correlation matrix
Learning curve
Confusion matrix
AUC - ROC curve
Precision-Recall vs Threshold Chart