NirmalKanagasabai / mcgill-tsa

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

McGill Twitter Sentiment Analysis dataset (MTSA)

This repository stores all code and files related to the MTSA dataset. The corresponding paper accepted at NAACL 2018, "Sentiment Analysis: It's Complicated", can be found online here.

You may only use this code and dataset for non-proprietary research purposes, as per Twitter's terms of service. Additionally, note that this project uses the GPU GPLv3 license, meaning that you may not incorporate this program "into proprietary programs".

Overview of provided code and data

The provided Python code is written and designed for Python 3.6.x.

Package dependencies (mainly for preprocessing):

Data files

In the directory data:

data/annotated_tweets.csv            ==> the annotated tweets as given by CrowdFlower
data/unannotated_tweets.csv          ==> the unannotated tweets before being sent to CrowdFlower
data/processed_annotated_tweets.npy  ==> numpy pickle file resulting from after data is preprocessed

Code files

In the directory src:

src/load_tweets.py    ==> main file, loads all tweets from crowdflower csv
src/preprocessing.py  ==> code for preprocessing tweets
src/tweet.py          ==> handy tweet objects, contains original and preprocessed text,
                          amenable to adding specific feature sets

Code used for feature extraction and experimental design in the paper is available on request.

Contact

Contact Kian Kenyon-Dean at kian.kenyon-dean@mail.mcgill.ca (or, on github) for questions about this repository.

About

License:GNU General Public License v3.0


Languages

Language:Python 100.0%