SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create dataset loader for multilingual-NLI-26lang-2mil7

SamuelCahyawijaya opened this issue · comments

Dataset multilingual_nli_26lang
Description This dataset contains 2 730 000 NLI text pairs in 26 languages spoken by more than 4 billion people. The dataset can be used to train models for multilingual NLI (Natural Language Inference) or zero-shot classification. The dataset is based on the English datasets MultiNLI, Fever-NLI, ANLI, LingNLI and WANLI and was created using the latest open-source machine translation models.
Subsets -
Languages ind, vie, eng
Tasks Natural Language Inference
License Unknown (unknown)
Homepage https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7
HF URL https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7
Paper URL https://www.cambridge.org/core/journals/political-analysis/article/less-annotating-more-classifying-addressing-the-data-scarcity-issue-of-supervised-machine-learning-with-deep-transfer-learning-and-bertnli/05BB05555241762889825B080E097C27

#self-assign