jiangqn / SST-preprocess

Preprocess stanford sentiment treebank dataset for sentiment classification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SST-preprocess

This is a python script for converting Stanford Sentiment Treebank dataset (https://nlp.stanford.edu/sentiment/index.html) to common used text classification format.

The stanfordSentimentTreebank.zip file is download from (http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip).

In the fine-grained sentiment classification dataset SST5, 0, 1, 2, 3, 4 represent very negative, negative, neutral, positive, very positive respectively.

In the binary sentiment classification dataset SST2, 0, 1 represent negative, positive respectively.

The statistics of dataset split for SST5 and SST2 are shown below.

Dataset SST5 SST2
Train 8544 6920
Dev 1101 872
Test 2210 1821

About

Preprocess stanford sentiment treebank dataset for sentiment classification.

License:MIT License


Languages

Language:Python 100.0%