hariprasath-v / Machinehack-analytics-olympiad-2022

Create a machine learning model to help an insurance company understand which claims are worth rejecting and the claims which should be accepted for reimbursement.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machinehack-analytics-olympiad-2022

Competition hosted on Machinehack

About

Create a machine learning model to help an insurance company understand which claims are worth rejecting and the claims that should be accepted for reimbursement.

The Final Competition score is 0.68081

Leaderboard Rank is 24

The Evaluation Metric is Logloss.

File information

  • machinehack-analytics-olympiad-2022-eda.ipynb Open in Kaggle

    Basic Exploratory Data Analysis

    Packages Used,

     * seaborn
     * Pandas
     * Numpy
     * Matplotlib
    
  • machinehack-analytics-olympiad-2022-model.ipynb Open in Kaggle

    Data Pre-processing and model.

    Packages Used,

      * Sklearn
      * Pandas
      * Numpy
      * Matplotlib
      * catboost
      * optuna
      * shap
    

    Created catboost classifier model and tuned the hyperparameters by using optuna framework. Model evaluated with Logloss.

Catboost model Optimization History - Explains the best score at each trials.

Alt text

Catboost – SHAP feature importance

Alt text

Catboost – SHAP top feature impact

Alt text

Top feature influences for class 1

Alt text

Top feature influences for class 0

Alt text

Overall Train and Validation Logloss

Alt text

About

Create a machine learning model to help an insurance company understand which claims are worth rejecting and the claims which should be accepted for reimbursement.

License:Apache License 2.0


Languages

Language:HTML 54.1%Language:Jupyter Notebook 45.9%