princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why so many sentences in your nli datasets are grammarly incorrect?

leoozy opened this issue · comments

Thank you for your excellent job. I am running the supervised setting and find that many sentences in your nli dataset are grammarly incorrect. Such as :", heritage assets, Federal mission PP&E), uncertain historical cost basis ". The SNLI and MNLI dataset are human labeled dataset and do not have such sentences I guess. Do you have some post-processing of these sentences ? Thank you!

Hi,

We directly take the SNLI and MNLI datasets and that might be some noise from the dataset.

Thank you for your help!