Employee_promotion_Prediction_using_CatBoost
Employee_promotion_Prediction_using_CatBoost
1. Business Problem
1.1. Description
Your client is a large MNC and they have 9 broad verticals across the organisation. One of the problem your client is facing is around identifying the right people for promotion (only for manager position and below) and prepare them in time.
Currently the process, they are following is: They first identify a set of employees based on recommendations/ past performance Selected employees go through the separate training and evaluation program for each vertical. These programs are based on the required skill of each vertical
At the end of the program, based on various factors such as training performance, KPI completion (only employees with KPIs completed greater than 60% are considered) etc., employee gets promotion For above mentioned process, the final promotions are only announced after the evaluation and this leads to delay in transition to their new roles. Hence, company needs your help in identifying the eligible candidates at a particular checkpoint so that they can expedite the entire promotion cycle.
They have provided multiple attributes around Employee's past and current performance along with demographics. Now, The task is to predict whether a potential promotee at checkpoint in the test set will be promoted or not after the evaluation process.
1.2. Source/Useful Links
https://datahack.analyticsvidhya.com/contest/wns-analytics-hackathon-2018/
2. Machine Learning Problem Formulation
2.1. Data
2.1.1. Data Overview
WNS Analytics Wizard 2018 Data hack competition from analyticsvidhya.com, same data set i am using
Training Data: it have 54808 records and 14 columns
Test Data: it have 23490 records and 13 columns
2.2. Mapping the real-world problem to an DL problem
2.2.1. Type of Deep Learning Problem
Binary Classification :
- Based on Employee's past and current performance along with demographics. Now, The task is to predict whether a potential promotee at checkpoint in the test set will be promoted or not after the evaluation process.:2.2.2. Performance Metric
Metric(s): F1 score
data set have unbalanced data[9:1] so better to select F1 score instead of Accuracy
Split the Training dataset into Two parts train, and cross validation with 70% and 30% of data respectively 2.3. Train, CV and Test Datasets
3. Code
Main code file is
wns_predicitng_potential_employee.ipynb
4. Final submission file
The final submission file generated by wns_predicitng_potential_employee.ipynb with final score of 0.5066991474