davidlkl / Insurance-Pricing-Game

1st place solution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AICrowd Insurance Pricing Game - 1st Place Solution

https://www.aicrowd.com/challenges/insurance-pricing-game

Folder structure:

Preprocessing.py

Feature engineering

  1. Binning
    Separate continuous variable into segments (Clipping is implicitly done too)
    Used for GLM to help capturing non-linear relationship
  2. Interactions
    a) Population density
    b) Driver Gender combination
    c) Vehicle feature interactions
    vh_value * vh_weight
    present_vh_value (exponential decay by vh_age)
    and more...
  3. Grouping
    Grouped Med1 and Med2 together in policy type
  4. Transformation
    Log-transform, power transform of some continuous variables
  5. History variable
    Historical Claim amount, Historical claim count, year since last claim, change in NCD

Training.py

Large Claim detection model:

  • A XGBoost and Logistic regression model to predict whether a claim would be >3k.

Claim estimation model:

  • I stacked 7 base models using a Tweedie GLM as the meta-learner under 5 fold CV.
    • Tweedie GLM
    • Light GBM
    • DeepForest
    • XGBoost
    • CatBoost
    • Neural Network with Tweedie deviance as loss function
    • Neural network with log-normal distribution likelihood as loss function (learning the mu and sigma of the loss)

Model.py

  • The script that is used to produce prediction inside the AICrowd environment
  • Pricing strategy is incorporated in the predict_premium function

The final presentation is also uploaded to this repository.

About

1st place solution


Languages

Language:Python 100.0%