Tiffany0410 / NLP-with-Twitter-Dataset

Assignment for JSC270 Data Science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JSC270_A4

Part 1 Sentiment Analysis with a Common Twitter Dataset

  • using a Twitter dataset containing just under 45,000 tweets related to COVID-19. These data come from a fairly recent Kaggle competition.
  • training classifiers to predict whether the tweet is positive, negative, or neutral,based only on the tweet itself.

Part 2 NLP with the Twitter API: Next Word Prediction

  • data extraction: tweets contain word "lockdown", total of 100000 tweets is used for analysis
  • data pre-processing: tokenization, removed special characters (URLs, usernames), lower-cases, lemmatization
  • building machine learning model: LSTM implemented using Keras.

About

Assignment for JSC270 Data Science


Languages

Language:Jupyter Notebook 100.0%