r-lomba / everglade

A dataset for NLP - Sentiment Analysis in Italian language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

everglade

A dataset for NLP - Sentiment Analysis in Italian language

Finding NLP resources and datasets for Italian Language can be challenging. In this repository you'll find a ready-to-use CSV file containing thousands of Customer Reviews regarding a plethora of different products that have been bought online. A few key facts about the Dataset:

  • Schema is as follows: <RATING_1_TO_5>|<FULL_REVIEW_ITALIAN_TEXT>
  • The Dataset is unbalanced. This is because random web scraping results in a Review Scores distribution that reflects the real-world Distribution of Customer Review Scores. That is, you'll find more Positive Reviews than Negative Reviews
  • The Dataset will grow with time, but it's a useful resource for NLP even today (more that 380000 samples already)
  • Dataset is split into multiple parts, e.g. "everglade_01.csv", "everglade_02.csv" etc, because the whole set is quite big. Just download all the files from the link above
  • Have fun!

Contacts:

You can contact me here: r.lombardelli@digitalgarage.it

Terms of use:

Please see the details of the Apache 2.0 License included in this repository

Disclaimer:

This GitHub repo and its contents herein, including all data, mapping, and analysis is provided to the public strictly for educational and academic research purposes. It is hereby disclaimed any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability

About

A dataset for NLP - Sentiment Analysis in Italian language

License:Apache License 2.0