lury / awesome-sentiment-analysis

Repository with all what is necessary for sentiment analysis and related areas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Sentiment Analysis

A curated list of awesome sentiment analysis frameworks, libraries, software (by language), and of course academic papers and methods. In addition NLP lib useful in sentiment analysis. Inspired by awesome-machine-learning.

If you want to contribute to this list (please do), send me a pull request or contact me @luk_augustyniak

Table of Contents

  • Python, Textlytics - set of sentiment analysis examples based on Amazon Data, SemEval, IMDB etc.

  • Java, Polish Sentiment Model - Sentiment analysis for polish language using SVM and BoW - within Docker.

  • Python, Spacy - Industrial-Strength Natural Language Processing in Python, one of the best and the fastest libs for NLP. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.

  • Python, TextBlob - TextBlob allows you to specify which algorithms you want to use under the hood of its simple API.

  • Python, pattern - The pattern.en module contains a fast part-of-speech tagger for English (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.

  • Java, CoreNLP by Stanford - NLP toolkit with Deeply Moving: Deep Learning for Sentiment Analysis.

  • R, TM - R text mining module including tm.plugin.sentiment.

  • Software, GATE - GATE is open source

  • Java, LingPipe - LingPipe is tool kit for

  • Python, NLTK - Natural Language Toolkit.

  • C++, MITIE - MIT Information Extraction

  • Software, KNIME - KNIMEĀ® Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. Our enterprise-grade, open source platform is fast to deploy, easy to scale and intuitive to learn. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. Our steady course on unrestricted open source is your passport to a global community of data scientists, their expertise, and their active contributions.

  • Software, RapidMiner - software capable of solving almost any text processing problem. processing text using computational linguistics.

  • JAVA, OpenNLP - The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

Lexicons:

Datasets:

  • Stanford Sentiment Treebank paper - Sentiment dataset with fine-grained sentiment annotations. The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. This competition presents a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.

  • Amazon product dataset - This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

  • IMDB movies reviews dataset - This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Authors provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.

  • Sentiment Labelled Sentences Data Set The dataset contains sentences labelled with positive or negative sentiment. This dataset was created for the Paper From Group to Individual Labels using Deep Features, Kotzias et. al,. KDD 2015. It contains sentences labelled with positive or negative sentiment. Score is either 1 (for positive) or 0 (for negative) The sentences come from three different websites/fields: imdb.com, amazon .com, yelp.com. For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews.
    We attempted to select sentences that have a clearly positive or negative connotaton, the goal was for no neutral sentences to be selected.

  • sentic.net - concept-level sentiment analysis, that is, performing tasks such as polarity detection and emotion recognition by leveraging on semantics and linguistics in stead of solely relying on word co-occurrence frequencies.

Word Embeddings:

  • WordNet2Vec - Corpora Agnostic Word Vectorization Method based on WordNet.

  • GloVe paper - Algorithm for obtaining word vectors. Pretrained word vectors available for download

  • Word2Vec by Mikolov paper - Google's original code and pretrained word embeddings.

  • Word2Vec Python lib - Google's word2vec reimplementation written in Python (cython). There are also doc2vec and topic modelling method.

SemEval Challenges - International Workshop on Semantic Evaluation web:

  • SAS2015 iPython Notebook brief introduction to Sentiment Analysis in Python @ Sentiment Analysis Symposium
  1. Scikit-learn + BoW + SemEval Data.
  • LingPipe Sentiment - This tutorial covers assigning sentiment to movie reviews using language models. There are many other approaches to sentiment. One we use fairly often is sentence based sentiment with a logistic regression classifier. Contact us if you need more information. For movie reviews we focus on two types of classification problem: Subjective (opinion) vs. Objective (fact) sentences Positive (favorable) vs. Negative (unfavorable) movie reviews

  • Stanford's cs224d lectures on Deep Learning for Natural Language Processing - course provided by Richard Socher.

Multimodal sentiment analysis:

  • demo of Stanford's Treebank Sentiment Analysis

About

Repository with all what is necessary for sentiment analysis and related areas