Deep Learning assignment

Deep learning assignment using text data.
Kaggle source: Kaggle source: https://www.kaggle.com/kazanova/sentiment140

Structure

data: this folder stores all the external data used by the notebooks
models: contains the latest versions of our trained neural network models
pre_processing.ipynb: Contains the code for pre-processing the raw twitter data
BaseModels.ipynb: Contains the code for creating the baseline models
NN.ipynb: Contains the code for creating, training and testing Models

Usage

Before running any of the code, download the data files through the following links

Data set up

Before running experiments put base dataset with tweets into data/training.1600000.processed.noemoticon.csv. Once data is in place, run all steps in pre_processing notebook. Be careful with last two cells of the notebook. They are extremely taxing on memory, so it is advised to run only one of the two vectorization techniques.

BaseModels.ipynb

Before this can be run, "vec.csv" must exist in data folder. "vec.csv" can be downloaded from the link above, or generated from pre_processing.ipynb. Both Hashed and non-hashed data can be obtained by following the instruction in pre_processing.ipynb. Cells should be run sequentially, and will generate result graphs seen in fig.1 of report.

NN.ipynb

Before you can start creating, training or evaluating a model, run the first two cells in the notebook. This will run all the import and import the twitter data.

Creating and training a model

Run all function definitions in the data cells under part 1 and 2. (These cells contain function definitions for splitting the data and creating models)
Choose the dictionary and padding size and run the preprocessing functions. (These cells run the functions for splitting the data, we need the training data and targets)
Point 4 is divided into four parts that each contain 4 code cells. The four parts correspond to each model and can be run separately.
1. creating the model
2. training the model
3. testing the model accuracy
4. saving the model in the "models" folder

Testing a model

Run all function definitions in the data cells under part 1 and 2. (These cells contain function definitions for splitting the data and creating models)
Choose the dictionary and padding size and run the preprocessing functions. (These cells run the functions for splitting the data, we need the test data and targets)
load the models from the files in the "models" folder
run the get_report function to get a summary of the accuracy, precision, recall, f1-score and support of the model

Ldaxar / DL2020