hilaryp / DS_Projects

A collection of data science practice projects.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DS_Projects

A collection of practice projects in Python and R.

  1. LSA word cloud
  • Practice building a web scraper to scrape the LSA 2015 abstract titles
  • Use nltk tokenizer & stemmer to process titles, get stem frequencies
  • Handle unicode encoding issues
  • Play with wordcloud package in R to plot results
  1. Data Incubator proposal project
  • Use 9 years of ACS/PUMS data to analyse languages spoken in Manhattan.
  • Identify and plot trends.
  1. Yelp reviews 1
  • Predict business's rating based on category, attributes, and location.
  • Ensemble of KNN and linear regression models.
  1. Yelp reviews 2
  • Predict restaurant's rating based on unstructured review text.
  • Tested bag-of-words, bigram, and TFIDF models.

About

A collection of data science practice projects.


Languages

Language:Python 80.0%Language:R 20.0%