Machine Learning Project
Data: Yelp Challenge dataset, which comes in Json format. The project focused on the review dataset which contains 5,261,669 examples with features being ​business_id, cool, date, funny, review_id, stars, text, useful and user_id
Data Preparation: 1. Non-English reviews 2. Unbalanced data set 3.Feature Extraction
Model: Multinomial Naive Bayes, FastText, Bidirectional LSTM with word embedding /sentence embedding (Skip-Thought)