Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e. disaster relief organizations and news agencies). Therefore, in this task I am prediction whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.
- Clone this repository to your computer
- Navigate to the project directory
cd twitter-sentiment-analysis
from your terminal - run
mkdir inputs
- use
cd inputs
to go into the directory where data should be stored - Download the data files from Kaggle
- Data can be found here
- If you don't have a Kaggle account you'd have to create one
- Install the requirements using
pip install -r requirements
- The python version is Python 3.8
- You're better off using virtual environment
-
Navigate to the
src
directory usingcd src
in the project folder- Then run
python train.py
- This will train an LSTM and create a directory with the
models
directory calledPRETRAIN_WORD2VEC_LSTM
with the serialized LSTM and tokenizer inside it. - Once you've trained the model, you could run your own examples by running the
user_interface.py
script in the top level directory. this will provide you with a private link. Once selected, input some text that you'd like to determine whether it's a disaster or not.
- Then run
-
View all explorations in
notebook
directory
Some ideas to extend this work:
- Methods to reduce inference time
- Use Different word embeddings
- Try LSTM with attention (See Attention in Long Short-Term Memory Recurrent Neural Networks)
- Use a transformer model
- Correct misspelled words
- Dealing with overfitting