Question about InputFeature generation in coqa

Question

Question about InputFeature generation in coqa

silencio94 opened this issue 5 years ago · comments

Thanks for uploading good and readable code and experiment setting, result.

btw, i have question about CoQA InputFeature generation.
In inputFeature generation code,
I think that your code seems to assume that doc span has always rationale to answer Free-form answers.

That means, sometimes, when doc span has no clue to answer free-form type question, it can be labeled incorrectly.
Is it intended? or Is there anything else I haven't understood?
Thanks for your work and have a good day!

Xiaoming · Answer 1 · Thu Mar 26 2020 06:09:47 GMT+0800 (China Standard Time)

Thanks, @silencio94 !

Although you have closed this issue, I'd like to explain a little bit more on your question. When pre-processing the context longer than max_seq_len (e.g. 512, etc.), it will be sliced into multiple sub-contexts with max_seq_len, and some of them will contain no answer. If we can't find free-form answer in the sub-context, the answer start/end will be intentionally labeled as first token (which is the [CLS] token)

Best,
Xiaoming

If you feel happy, please star me :)

doubledrive · Answer 2 · Fri Mar 27 2020 17:44:37 GMT+0800 (China Standard Time)

I've been reading some CQA codes lately, and I'm probably confused with them 😂. Thank you for kind explanation!