Modojojo / imdb-sentiment-analysis

Sentiment analysis on IMDB dataset using DistilBERT. Fine-tuned a pre-trained DistilBERT model on the IMDB dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IMDB Sentiment Analysis

Classify a movie review into Positive or Negative review using DistilBERT Transformer.
A pre-trained DistilBERT transformer model was used which was fine-tuned on the IMDB reviews dataset.

Webapp

Home page

home page

Positive review

positive review input

Prediction for Positive review

prediction for positive review

Negative Review

negative review input

Prediction for negative review

prediction for negative review

Repository Structure

main
|── models
|    |── model                   # Holds model files
|    └── tokenizer               # Holds tokenizer files 
|
|── screenshots                  # Contains webapp screenshots
|
|── src
|    |── __init__.py     
|    └── models_utils.py         # Model loading and prediction utility
|
|── templates
|    |── index.html 
|    └── error.html
| 
|── training
|    └── imdb_sentiment_analysis_training.ipynb    # Training notebook
|
|── .gitignore
|── README.md
|── app.py
|── params.yaml
|── requirements.txt
└── setup.py



Setup

  1. Create a new conda environment
conda create -n imdb-sentiment-analysis python==3.7
  1. run the setup file
    Note - If any error occurs for any package, comment the corresponding package in requirements.txt file and install the dependency separately.
pip install .
  1. Use the notebook provided in Training folder to train a DistilBERT Transformer model on the imdb reviews dataset.
  2. Download/Save the trained Model and Tokenizer files and paste them in the corresponding folder in the models directory.
  3. Run the app
python app.py

About

Sentiment analysis on IMDB dataset using DistilBERT. Fine-tuned a pre-trained DistilBERT model on the IMDB dataset.


Languages

Language:Jupyter Notebook 97.3%Language:HTML 1.9%Language:Python 0.8%