ravising-h / The-Great-Data-Science-Challenge

A text analysis challenege on Hackerearth by Infosys where data was highly imbalanced.

https://www.hackerearth.com/submission/27500271/

text-classification xgboost-algorithm random-forest undersampling oversampling machine-learning

The great Data Science Challenge.

It was a hackathon hosted by Hacker Rank for Infosys. Our task was to categorize the invoice according to Item Description.

Challenges

Data was highly Imbalanced in some categories only 1 datapoint was known.
How to prepare validation set as data was highly imbalanced.
Text Analysis.

Solutions

Implemented Oversampling and undersampling of data.
Model was combined result of Random Forest and XgBoost Algorithm

First time using Git.

My Result

without sampling https://www.hackerearth.com/submission/27500271/

About

A text analysis challenege on Hackerearth by Infosys where data was highly imbalanced.

https://www.hackerearth.com/submission/27500271/

text-classification xgboost-algorithm random-forest undersampling oversampling machine-learning

Languages

Language:Jupyter Notebook 100.0%