ibrahimsharaf / doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Doc2Vec Text Classification Build Status

Text classification model which uses gensim Doc2Vec for generating paragraph embeddings and scikit-learn Logistic Regression for classification.

Dataset

25,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary (1 for postive, 0 for negative).

This source dataset was collected in association with the following publication:

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis." The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Usage

  • Install the required tools

    pip install -r requirements.txt

  • Run the script

    python text_classifier.py

References

About

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

License:MIT License


Languages

Language:Python 100.0%