A collection of practice projects in Python and R.
- LSA word cloud
- Practice building a web scraper to scrape the LSA 2015 abstract titles
- Use nltk tokenizer & stemmer to process titles, get stem frequencies
- Handle unicode encoding issues
- Play with wordcloud package in R to plot results
- Data Incubator proposal project
- Use 9 years of ACS/PUMS data to analyse languages spoken in Manhattan.
- Identify and plot trends.
- Yelp reviews 1
- Predict business's rating based on category, attributes, and location.
- Ensemble of KNN and linear regression models.
- Yelp reviews 2
- Predict restaurant's rating based on unstructured review text.
- Tested bag-of-words, bigram, and TFIDF models.