This repository was submitted as a final project for a Data Science course at General Assembly. The below was the initial project plan and overview; a more detailed write-up can be found in the "Final Poject Paper" doc.
Data being used from the Yelp Dataset Challenge: http://www.yelp.com/dataset_challenge To minimize size of repo, pull data from yelp and convert to csv using json_to_csv_converter.py Other data in \data folder can be found in comments of the code
##1. Specific Aim:
- Goal: Create a model that predicts how many stars a review for a restaursnt is based on the review text. (If necessary, group 1-3 stars and 4-5 stars for "bad" and "good" ratings).
##2. Methods:
- Word2Vec, Doc2Vec, NLTK Library
- Decision Tree
- Random Forests
- Logistic Regression
##3. Result:
##4. Limitations / assumptions of your data
- People are thoughtful and rational when writing reviews and rating restaurants, meaning there is a correlation between text sentiment and star-ratings.
##5. Expected hurdles
##6. Where you need help
##7. Repeat for secondary hypothesis