clinc / oos-eval

Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Re-partitioning Data: Which worker wrote which query?

liaeh opened this issue · comments

commented

Hi! I want re-partition the dataset to create 5 different train/valid/test splits for my analyses. In the paper, you mention that all queries from a given crowd worker were place in a single split. Is it possible to share information about which queries were generated by the same worker? I'd like to minimize any in-scope biases in my splits as well.

Thanks for your interest in the dataset. Unfortunately, I no longer have the information linking crowd workers to particular utterances (other than the original train/val/test split).