The great Data Science Challenge.
It was a hackathon hosted by Hacker Rank
for Infosys
.
Our task was to categorize the invoice according to Item Description.
Challenges
- Data was highly Imbalanced in some categories only 1 datapoint was known.
- How to prepare validation set as data was highly imbalanced.
- Text Analysis.
Solutions
- Implemented Oversampling and undersampling of data.
- Model was combined result of
Random Forest
andXgBoost Algorithm
First time using Git.
My Result
without sampling https://www.hackerearth.com/submission/27500271/