princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Weird SST2 dataset size

sh0416 opened this issue · comments

I've just reproduced this work and starting with the dataset named SST2.

I found that the size of SST2 in this paper is 6.9k, but the size of SST2 in GLUE paper is 69k.

I double checked this information, and the SST2 dataset distributed in this repository has 6.9k, but the SST2 dataset distributed in the huggingface datasets has 69k.

I think some kind of filtering is applied to this data.

Could you clarify what it is?

Thank you

I want to reproduce the performance of SST2 Fine-tuning(full) in Table 3.
In the caption, the size of dataset is described in Table B.1.
In the Table B.1., the size of SST2 train dataset is 6920.

When I downloaded the SST2 training data from huggingface datasets, the size of the dataset is around 69000, which is ten times larger than the dataset distributed in this repository.

Also, I am curious whether the Fine-tuning (full) is a traditional approach without template and label words.

Hi, thanks for the interest. The training set for SST-2 dataset that is commonly distributed by GLUE etc is separated into densely labeled phrases. The dev and test sets are full sentences (note the equivalent sizes to ours). We use the original (unsplit) sentences, hence the order of magnitude size difference. This is the same as in https://github.com/openai/generating-reviews-discovering-sentiment, for example.

The fine-tuning (full) is the traditional approach.

Great. Thank you for clarifying the detail.