penzant / nlu_datasets_2018

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Natural language understanding dataets in 2018

This page collects NLU datasets proposed in 2018.

Dataset task style size source where web misc similar datasets
CoQA RC free form (+no ans) 127k various articles TACL? url conversational questions QuAC
QuAC RC extraction (+no ans) 100k Wikipedia EMNLP2018 url conversational questions CoQA
HotpotQA RC extraction 113k Wikipedia EMNLP2018 url multi-hop reasoning QAngaroo
SWAG QA multiple choice 113k video caption EMNLP2018 url situational commonsense reasoning
DNC NLI textual entailment 570k NLP tasks EMNLP2018 url diverse NLI SNLI, MultiNLI
OpenBookQA QA multiple choice 6k science facts EMNLP2018 url external knowledge ARC
RecipeQA RC+ various 36k recipe EMNLP2018 url multimodal comprehension TextbookQA, FigureQA
CLOTH RC cloze 99k English exams EMNLP2018 url RACE
DuoRC RC extraction 186k movie plot ACL2018 url NarrativeQA
SQuAD2.0 RC extraction (+no ans) 150k Wikipedia ACL2018 url no answer: 50k NewsQA
CliCR RC cloze 100k clinical case text NAACL2018 url
FEVER NLI? fact verification 185k Wikipedia NAACL2018 url
MultiRC RC multiple choice 6k+ various articles NAACL2018 url multiple sentence reasoning MCTest
ProPara RC various 2k procedural text NAACL2018 url bAbI, SCoNE
ARC RC multiple choice 8k science exam ? url easy 5197, challenge 2590

TODO:

Note:

  • QA = question answering, RC = reading comprehension = question answering with the context, NLI = natural language inference aka recognizing textual entailment

About