imink / UCL_COMPIG15_Project

Group Project for COMPIG15 Information Retrieval and Data Mining

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Group Project for COMPIG15@UCL Information Retrieval and Data Mining

Project Info

Sentiment Analysis with Twitter Data

Sentiment analysis aims to judge whether the emotional tend expressed in a text is positive or negative. Generally speaking, sentiment analysis is trying to determine the attitude of a speaker or a writer on a topic or the overall contextual polarity of a document. The basic task is classifying the polarity of the given text. In this project, it focused on the Tweeter, a popular microblog. The various models are built to classify the tweets into positive or negative sentiment. Before using the texts, there are several ways to pre-process the data. Moreover, then the feature extraction converts the twitter texts to a vector. There are three methods to obtain the feature: TI-IDF, TF, and FP. Moreover, the five supervised learning classifiers are implemented, including Naïve Bayes, Logistic regression, SVM, DecisionTrees, and KNN. Finally, the evaluation is used for these five models, and the SVM achieved the best performance.

Team Member

  • Shuo Wang, MSc Web Science and Big Data
  • Yue Wang, MSc NCS
  • Xizhe Jiang, MSc NCS

How to use the code

  • Git clone repo, the program is within the implementation folder
  • Make sure you have already install scikit learn, NLTK, pandas, matplotlib in advance
  • run command python sentiment_analysis.py in the implementation folder, you now run the main program
  • by unblock and block the comment in the code, choose the feature extraction and classifier you want

Dataset

Note:

About

Group Project for COMPIG15 Information Retrieval and Data Mining

License:MIT License


Languages

Language:Python 100.0%