decission-tree-classifier feedforward-neural-network hindi-sentiment-analysis iiith logistic-regression lstm-neural-networks machine-learning multi-class-classification naive-bayes-classifier svm tf-idf amazon-hindi-reviews niave

A Hindi Sentiment Analysis Corpus from Amazon Reviews

Data

Folder contains the train and test data files. This dataset comprises training and test data for sentiment analysis in Hindi, extracted from Amazon reviews.

Data Description

Training Data: The training data folder contains 3527 reviews labeled for sentiment analysis. Each review is labeled as positive, negative, or neutral.
Test Data: The test data folder contains 884 reviews for evaluating the performance of sentiment analysis models. Similar to the training data, each review is labeled as positive, negative, or neutral.

Label Distribution

Positive: 36.2% of the training data
Negative: 44.6% of the training data
Neutral: 19.2% of the training data

How to Use

Training Data: Utilize the training data to train sentiment analysis models in Hindi.
Test Data: Evaluate the performance of trained models using the provided test data.

Models

Various models for sentiment analysis are available in the code folder, including Support Vector Machine (SVM), Decision Tree, Naive Bayes, Feedforward Neural Network (FFNN), and Long Short-Term Memory (LSTM). The code files contain the implementation of each model, and the file paths are structured according to our Kaggle notebook setup. For LSTM, pre-trained Hindi embeddings from IndicNLP are utilized. Additionally, the Elmo method is employed to create embeddings specific to our dataset, which are then utilized in the LSTM task.

Additional Information

For more detailed insights into the dataset construction, data preprocessing, experimental setup, and model implementation, please refer to the report. If you have any further questions or need assistance, feel free to reach out.

Citation

If you use this dataset in your research or project, please consider citing the dataset as follows: Dataset Name: HSAC
Repository Name: A-Hindi-Sentiment-Analysis-Corpus-from-Amazon-Reviews
URL: https://github.com/Udrasht/Hindi-Sentiment-Analysis-Corpus-from-Amazon-Reviews

Contributors

Udrasht Pal (udrashtpal@gmail.com or udrasht.pal@students.iiit.ac.in)
Nikhil Khemchandani (nikhilkhemchandani5@gmail.com or nikhil.khemchandani@students.iiit.ac.in)
Instructor: Radhika Mamidi (radhika.mamidi@iiit.ac.in)

About

This project showcases a dataset of Amazon Reviews in Hindi, which we created ourselves. We applied various machine learning methods including Naive Bayes, SVM, and Decision Tree, using both Bag-of-Words and TF-IDF. Additionally, we experimented with deep learning techniques such as Feedforward Neural Networks and LSTM with ELMO embeddings.

decission-tree-classifier feedforward-neural-network hindi-sentiment-analysis iiith logistic-regression lstm-neural-networks machine-learning multi-class-classification naive-bayes-classifier svm tf-idf amazon-hindi-reviews niave

Languages

Language:Jupyter Notebook 100.0%