purvasingh96 / Covhindia

🦠 A framework that leverages machine translation and the BERT model for performing multi-lingual sentiment polarity detection of COVID-19 tweets posted in Hindi language on Twitter.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COVHINDIA : Deep Learning Model for Sentiment Classification on COVID19 Tweets from India

Overview

This repository describes a framework to perform sentiment analyis on COVID-19 tweets posted in Hindi language on Twitter platform. The framework leverages open-source machine translation tools to translate Hindi tweet to English and then pass the preprocessed translated tweet as an input to a BERT-based model for performing multi-lingual sentiment polarity detection.

Datasets

  1. Tweet dataset : Kaggle dataset
  2. GloVe embeddings : GloVe
  3. FastText embeddings : FastText
  4. GloVe Twitter : GloveTwitter
  5. Crisis Embeddings : Crisis Embeddings
Model Description Training Accuracy Validation Accuracy Notebook Link
Basic LSTM 85.1% 86.7% Notebook
LSTM + GloVe Embeddings 88.9% 90.9% Notebook
LSTM + FastText Embeddings 92.5% 88.9% Notebook
LSTM + Crisis Embeddings 83.4% 84.7% Notebook
Basic Bi-directional LSTM 87.3% 86.0% Notebook
Bi-directional LSTM + GloVe Embeddings 91.2% 90.6% Notebook
Bi-directional LSTM + FastText Embeddings 88.3% 88.6% Notebook
Bi-directional LSTM + Crisis Embeddings 86.0% 85.1% Notebook
BERT 99.7% 93.8% Notebook

System Architecture

Below is the system architecture for sentiment polarity detection of COVID-19 tweets in Hindi using machine translation and BERT.

Research Paper

I have published my findings as a research paper: 'Covhindia: Deep Learning Framework for Sentiment Polarity Detection of Covid-19 Tweets in Hindi' in the 'International Journal on Natural Language Computing'

References

Paper on sentiment analysis

Using NMT/HSWN for sentiment analysis on Hindi tweets

About

🦠 A framework that leverages machine translation and the BERT model for performing multi-lingual sentiment polarity detection of COVID-19 tweets posted in Hindi language on Twitter.


Languages

Language:Jupyter Notebook 100.0%