kevinduh / san_mrc

Stochastic Answer Networks (SAN) for Machine Reading Comprehension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can NOT replicate SQuAD 2.0 results

nlpgiant opened this issue · comments

The readme says that the F1 on Squad 2.0 is 72.x using this package; but all I got was 66.x.

Seems from the other open issue, that others have faced a similar problem. Can you update the latest code @namisan @kevinduh @ryu577 ?

@kevinduh,@namisan, @ryu577 : I appreciate your work. I am facing same issue as @nlpgiant. I could not replicate the results in Readme.md in any way. Could you share the hyperparameters and some more information on the training settings

Is anyone listening to us?

Same. I'm seeing 66.x F1 on dev

Also seeing the same results. Tried changing all hidden sizes to 256, and ran for 30 epochs. Max F1 was 67.004

Hi all,

@kevinduh,@namisan, @ryu577 Thank you for the code!

If it helps, I got EM=67.6 ; F1 =70.6 at epoch 15 for a dropout_p = 0.1 (I got no better scores for 50 epochs).

Sorry for late reply.
I'm pretty busy, and I'm current working on the code refactor and ELMo version (It attains a great improvement and I hope to release it soon).
I am also considering to share the worksheet on codalab, which requires the permission of my boss. As I mentioned in the other open issues, a lower dropout_p (0.1), a larger hidden size (256/300) and a larger number of reasoning step (10) lead a much better result.

Hi, @namisan Thanks a lot for your reply! I tried to factor in ELMo as well, but it seems only helps about 1 point on SQuAD 1, I'm wondering if you include ELMo only in the first layer together with word embedding or you put ELMo in later layers too together with CoVE? Thanks a lot!

Forget ELMo, I'd like to know how to get 72.x from this toolkit as is mentioned in the README.
@namisan which hidden size: decoder_att_hidden_size to 256 and what is the reasoning step: decoder_num_turns to 10 ? Please elaborate.

@ZhaoyueCheng The current result with ELMo is around 87.x in terms of F1 (+2.x). You can refer our recent paper for the details. https://arxiv.org/pdf/1809.06963.pdf

@nlpgiant All the hidden size. Yes it is decoder_num_turn.

@namisan Thanks a lot for the reply!

I released the worksheets of official submissions. I close this.