Diachronic Twitter Sentiment Analysis

Natasha Kamtekar | nak142 | 4/24/2020

A link to the project guestbook can be found here

Project Description

This project looks at Twitter data from the internet archive from 2011 and 2019 respectively. It compares and contrasts tweeting habits from both points in time such as content, lexical complexity, tweet length, and tweet sentiment.

How is popular content of either era percieved or talked about?
Have tweets become a medium for displaying more complex sentiment than they used to be?
How does this complexity relate to overall sentiment?

Project Data

The main data used in this project came from a sample of the 2011 and 2019 internet archive JSON files, where the top 1% of tweets were scraped from October of 2011 and September of 2019. There was also a classifier built for the sentiment analysis portion of the analysis which used the open source data from Sentiment140, a pre-existing algorithm for sentiment analysis. The data used in the project can be found here, the classifier data can be found on the Sentiment140 website linked above.

Important documents
Folders
- Data sample
  - 2011 dataset
  - 2019 dataset
- Notebooks
  - build_classifier: build the classifier for sentiment analysis.
  - data_parsing: strips the tweet data to necessary columns
  - data_analysis and classifyanalysis: different stages of the analysis process
  - finalnb: the final jupyter notebook
- Images:
  - Contains images of all plots

About

Analyzing how twitter's user base has changed the way they discuss things over the years.

GNU General Public License v3.0

Languages

Language:Jupyter Notebook 100.0%

Data-Science-for-Linguists-2020 / Twitter-Positivity-Analysis