Udrasht / Hindi-Sentiment-Analysis-Corpus-from-Amazon-Reviews

This project showcases a dataset of Amazon Reviews in Hindi, which we created ourselves. We applied various machine learning methods including Naive Bayes, SVM, and Decision Tree, using both Bag-of-Words and TF-IDF. Additionally, we experimented with deep learning techniques such as Feedforward Neural Networks and LSTM with ELMO embeddings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Hindi Sentiment Analysis Corpus from Amazon Reviews

Data

Folder contains the train and test data files. This dataset comprises training and test data for sentiment analysis in Hindi, extracted from Amazon reviews.

Data Description

  • Training Data: The training data folder contains 3527 reviews labeled for sentiment analysis. Each review is labeled as positive, negative, or neutral.

  • Test Data: The test data folder contains 884 reviews for evaluating the performance of sentiment analysis models. Similar to the training data, each review is labeled as positive, negative, or neutral.

Label Distribution

  • Positive: 36.2% of the training data
  • Negative: 44.6% of the training data
  • Neutral: 19.2% of the training data

How to Use

  1. Training Data: Utilize the training data to train sentiment analysis models in Hindi.
  2. Test Data: Evaluate the performance of trained models using the provided test data.

Models

Models

Various models for sentiment analysis are available in the code folder, including Support Vector Machine (SVM), Decision Tree, Naive Bayes, Feedforward Neural Network (FFNN), and Long Short-Term Memory (LSTM). The code files contain the implementation of each model, and the file paths are structured according to our Kaggle notebook setup. For LSTM, pre-trained Hindi embeddings from IndicNLP are utilized. Additionally, the Elmo method is employed to create embeddings specific to our dataset, which are then utilized in the LSTM task.

Additional Information

For more detailed insights into the dataset construction, data preprocessing, experimental setup, and model implementation, please refer to the report. If you have any further questions or need assistance, feel free to reach out.

Citation

If you use this dataset in your research or project, please consider citing the dataset as follows: Dataset Name: HSAC
Repository Name: A-Hindi-Sentiment-Analysis-Corpus-from-Amazon-Reviews
URL: https://github.com/Udrasht/Hindi-Sentiment-Analysis-Corpus-from-Amazon-Reviews

Contributors

About

This project showcases a dataset of Amazon Reviews in Hindi, which we created ourselves. We applied various machine learning methods including Naive Bayes, SVM, and Decision Tree, using both Bag-of-Words and TF-IDF. Additionally, we experimented with deep learning techniques such as Feedforward Neural Networks and LSTM with ELMO embeddings.


Languages

Language:Jupyter Notebook 100.0%