rnn-tensorflow disaster-tweet glove rnn-lstm tensorflow glove-embeddings exploratory-data-analysis rnn-language-model

Real or Not? NLP with Disaster Tweets.

Classifying whether a disaster tweet is real or not using RNN with LSTM and GloVe word embeddings. The model gave an accuracy of 80% on both train and validation data set with learning rate 5e-5, predicting whether a given tweet is about a real disaster or not. If so, predicted as 1. If not, predicted as 0. The datasets have been taken from Kaggle Data sets

The kaggle notebook for running file can be viewed here

Each sample in the train and test set has the information about the text of a tweet, A keyword from that tweet (although this may be blank!) and The location the tweet was sent from (may also be blank)

CSV data set has Columns as:

id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

EDA performed on data sets are

1. Data processing

1.1 Handling Misspelled data

1.2 Handling Contractions

1.3 Replacing Abbreviations

1.4 Visualizing length of tweets

1.5 Visualizing word count in each tweet

1.6 Collecting all words

2. Visualizing and data attributes

2.1 Viewing most common stop words used in tweets

2.2 Viewing Punctuations in tweets

2.3 Viewing Common words in tweets

2.4 N-gram analysis

3. Data cleaning

3.1 Cleaning URLs and HTML tags

3.2 Cleaning Punctuations and emojis

3.3 Cleaning stop words

3.4 Using Glo-Ve for word embeddings

3.5 Train-Test split

4. Creating Model

4.1 LSTM Model with Glove Embeddings

4.2 Plotting accuracy and loss curves

About

Classifying whether a disaster tweet is real or not using LSTM and GloVe word embeddings

rnn-tensorflow disaster-tweet glove rnn-lstm tensorflow glove-embeddings exploratory-data-analysis rnn-language-model

MIT License

Languages

Language:Jupyter Notebook 100.0%