Can NOT replicate SQuAD 2.0 results

Question

Can NOT replicate SQuAD 2.0 results

nlpgiant opened this issue 6 years ago · comments

The readme says that the F1 on Squad 2.0 is 72.x using this package; but all I got was 66.x.

Seems from the other open issue, that others have faced a similar problem. Can you update the latest code @namisan @kevinduh @ryu577 ?

Chaitanya · Answer 1 · Thu Oct 25 2018 10:43:37 GMT+0800 (China Standard Time)

@kevinduh,@namisan, @ryu577 : I appreciate your work. I am facing same issue as @nlpgiant. I could not replicate the results in Readme.md in any way. Could you share the hyperparameters and some more information on the training settings

nlpgiant · Answer 2 · Thu Oct 25 2018 21:43:36 GMT+0800 (China Standard Time)

Is anyone listening to us?

Frankie Liuzzi · Answer 3 · Fri Oct 26 2018 03:51:02 GMT+0800 (China Standard Time)

Same. I'm seeing 66.x F1 on dev

Srinivas Ravishankar · Answer 4 · Sun Oct 28 2018 03:44:51 GMT+0800 (China Standard Time)

Also seeing the same results. Tried changing all hidden sizes to 256, and ran for 30 epochs. Max F1 was 67.004

Leo Laugier · Answer 5 · Sun Oct 28 2018 14:27:24 GMT+0800 (China Standard Time)

Hi all,

@kevinduh,@namisan, @ryu577 Thank you for the code!

If it helps, I got EM=67.6 ; F1 =70.6 at epoch 15 for a dropout_p = 0.1 (I got no better scores for 50 epochs).

Xiaodong · Answer 6 · Mon Oct 29 2018 04:28:59 GMT+0800 (China Standard Time)

Sorry for late reply.
I'm pretty busy, and I'm current working on the code refactor and ELMo version (It attains a great improvement and I hope to release it soon).
I am also considering to share the worksheet on codalab, which requires the permission of my boss. As I mentioned in the other open issues, a lower dropout_p (0.1), a larger hidden size (256/300) and a larger number of reasoning step (10) lead a much better result.

Zhaoyue Cheng · Answer 7 · Mon Oct 29 2018 23:12:38 GMT+0800 (China Standard Time)

Hi, @namisan Thanks a lot for your reply! I tried to factor in ELMo as well, but it seems only helps about 1 point on SQuAD 1, I'm wondering if you include ELMo only in the first layer together with word embedding or you put ELMo in later layers too together with CoVE? Thanks a lot!

nlpgiant · Answer 8 · Mon Oct 29 2018 23:50:08 GMT+0800 (China Standard Time)

Forget ELMo, I'd like to know how to get 72.x from this toolkit as is mentioned in the README.
@namisan which hidden size: decoder_att_hidden_size to 256 and what is the reasoning step: decoder_num_turns to 10 ? Please elaborate.

Xiaodong · Answer 9 · Tue Oct 30 2018 00:22:08 GMT+0800 (China Standard Time)

@ZhaoyueCheng The current result with ELMo is around 87.x in terms of F1 (+2.x). You can refer our recent paper for the details. https://arxiv.org/pdf/1809.06963.pdf

@nlpgiant All the hidden size. Yes it is decoder_num_turn.

Zhaoyue Cheng · Answer 10 · Tue Oct 30 2018 05:30:45 GMT+0800 (China Standard Time)

@namisan Thanks a lot for the reply!

Xiaodong · Answer 11 · Sat Nov 10 2018 03:05:04 GMT+0800 (China Standard Time)

I released the worksheets of official submissions. I close this.