AnilSener / Axa-Insurance-Telematics-Kaggle

I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.

Geek Repo

Github PK Tool

AnilSener/Axa-Insurance-Telematics-Kaggle Stargazers

Jo
aydv
BozkurtTuran
Bturan19
caofengnian
Hwanpyo Kim
legendarykim
LorenzoBottaccioli
marek-rodny
mibesr
Ozlem Yildirim
ozlemmye
ユリゴコロ
Randallzoeng
Sandy4321
Shubham Pachori
shubhampachori12110095
Xiaojuan
tian43
Trương Khánh Duy
truongkhanhduy95
xiuxianxi