This is a project for the Big Data Analysis Course in National Taipei University of Technology (NTUT). The program takes queries from users and spits out the current general sentiment regarding the specific topic in Twitter.
- Part 1 - NLP & Bag of Words
- Part 2 - Distributed Word Vectors
- Part 3 - Twitter Sentiment Analysis
- Part 4 - Parallel Processing for Large Datasets
- Project Demonstration
Google Slide Presentation for Term Presentation
You have to have Python 3.X and Jupyter Installed, and as for the modules used in the code, just install the following using a UNIX system.
$ pip install tweepy re nltk pickle pandas numpy gensim sklearn
-
Due to the size of the dataset and the trained classifiers, you will have to download the following files from this Google Drive link and place them in its respective folders.
-
Run
$ jupyter notebook
in the folder, and open "Demo" for a demonstration. -
If you wish to run parallel prosessing, type
$ ipcluster start --n=4 --profile='movie-view'
in your terminal to start the parallel workers.
- Python
- Jupyter Notebook
- Ipythonparallel
- Michael Fu - System Design and Code Implementation - michaelandhsm2
- Leon Shang - Code Implementation and Testing - leon20121005
- Sherry Wang - Algorithm Research and Implementation Consuling - asweetapple
- Dataset obtained from Stanford Twitter Sentiment Dataset
- Inspiration - Kaggle's Bag of Words Meets Bags of Popcorn Tutorial
- And the course of Big Data Analysis in Spring 2017, taught by Jenq-Haur Wang.