clinc / oos-eval

Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

confusing about Table 3 in paper.

DevRoss opened this issue · comments

I'm confusing about Table 3 in paper.

What is the experimental process?

I guess the binary classifier (oos detector) is first trained on "binary_undersample.json" (or "binary_wiki_aug.json" in wiki aug experiment), to detection whether the utterances are "in" or "oos", then build downstream multi-classes classifier (e.g. 150 classes for in-scope data) to deal with "in" samples from upstream oos detector.

In-Scope Accuracy was evaluated on "test" in "data_oos_plus.json", and Out-of-Scope Recall was evaluated on "oos_test" in "data_oos_plus.json".

For Table 3 we indeed had a 2-tiered approach: we would first train a binary classifier on in/oos. We also train a 150-intent classifier, so that if the binary classifier predicts a sample to be in-scope, then the 150-intent classifier is used to assign the sample to one of the 150 in-scope intents. Your description of how this was evaluated is correct.

Understood. Thank you!