ravising-h / The-Great-Data-Science-Challenge

A text analysis challenege on Hackerearth by Infosys where data was highly imbalanced.

Home Page: https://www.hackerearth.com/submission/27500271/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The great Data Science Challenge.

It was a hackathon hosted by Hacker Rank for Infosys. Our task was to categorize the invoice according to Item Description.

Challenges

  • Data was highly Imbalanced in some categories only 1 datapoint was known.
  • How to prepare validation set as data was highly imbalanced.
  • Text Analysis.

Solutions

  • Implemented Oversampling and undersampling of data.
  • Model was combined result of Random Forest and XgBoost Algorithm

First time using Git.

My Result

without sampling https://www.hackerearth.com/submission/27500271/

About

A text analysis challenege on Hackerearth by Infosys where data was highly imbalanced.

https://www.hackerearth.com/submission/27500271/


Languages

Language:Jupyter Notebook 100.0%