Can NOT replicate SQuAD 2.0 results
nlpgiant opened this issue · comments
Is anyone listening to us?
Same. I'm seeing 66.x F1 on dev
Also seeing the same results. Tried changing all hidden sizes to 256, and ran for 30 epochs. Max F1 was 67.004
Sorry for late reply.
I'm pretty busy, and I'm current working on the code refactor and ELMo version (It attains a great improvement and I hope to release it soon).
I am also considering to share the worksheet on codalab, which requires the permission of my boss. As I mentioned in the other open issues, a lower dropout_p (0.1), a larger hidden size (256/300) and a larger number of reasoning step (10) lead a much better result.
Hi, @namisan Thanks a lot for your reply! I tried to factor in ELMo as well, but it seems only helps about 1 point on SQuAD 1, I'm wondering if you include ELMo only in the first layer together with word embedding or you put ELMo in later layers too together with CoVE? Thanks a lot!
Forget ELMo, I'd like to know how to get 72.x from this toolkit as is mentioned in the README.
@namisan which hidden size: decoder_att_hidden_size
to 256 and what is the reasoning step: decoder_num_turns
to 10 ? Please elaborate.
@ZhaoyueCheng The current result with ELMo is around 87.x in terms of F1 (+2.x). You can refer our recent paper for the details. https://arxiv.org/pdf/1809.06963.pdf
@nlpgiant All the hidden size. Yes it is decoder_num_turn.
@namisan Thanks a lot for the reply!
I released the worksheets of official submissions. I close this.