kevinduh / san_mrc

Stochastic Answer Networks (SAN) for Machine Reading Comprehension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Elmo on Squad2 Not Converging

liuzzi opened this issue · comments

Hi! firstoff, thanks for this awesome repo.

So I've been playing with the new addition of Elmo embeddings with squad 2 and it doesnt seem to ever learn much. accuracy and f1s are all around 50% and loss barely goes down after a certain point. Do you think this is a hyperparam issue? Any ideas?


I haven't test on v2 yet, but can you share the log to me? I'll take a look. thanks.

For config I used the same as in the repo, except lowered dropout_p=0.1 and just trained with --elmo_on.

training loss creeps down about .03 per epoch
learning curve looks like this:

[Epoch 0 - dev EM: 49.600 F1: 49.653 (best EM: 49.600 F1: 49.653)]
[Epoch 0 - ACC: 49.9284]
[Epoch 1 - dev EM: 49.701 F1: 49.783 (best EM: 49.701 F1: 49.783)]
[Epoch 1 - ACC: 49.9284]
[Epoch 2 - dev EM: 49.844 F1: 49.856 (best EM: 49.844 F1: 49.856)]
[Epoch 2 - ACC: 49.9284]
[Epoch 3 - dev EM: 50.013 F1: 50.013 (best EM: 50.013 F1: 50.013)]
[Epoch 3 - ACC: 49.9284]
[Epoch 4 - dev EM: 50.021 F1: 50.027 (best EM: 50.021 F1: 50.027)]
[Epoch 4 - ACC: 49.9284]

Thanks for the log.
I worked on Elmo recently applying it with CNTK
Could you share your repo so that I can check where its not working?

All i did was follow the squad2 training directions from the master repo here, and trained with --elmo_on

Hi @liuzzi @namisan,

While training on SQUAD2 (without ELMO) does the following command worked for you without any changes ? I am facing errors regarding the dev_gold labels directory.

python --v2_on --dev_gold data\dev-v2.0.json


I don't believe i had problems with the labels directory, although i vaguely remember having to correct some Elmo errors. If i get the chance to test the clean master repo again i'll update you

I cloned the repo a couple of days back and was able to reproduce the results on v1.1, both with and without Elmo but got stuck on v2.0.


For me the model is converging with Elmo on Squad2, though EM and F1 scores are better without Elmo embeddings on same set of hyperparameters.

I didn't see the converge issues if you use an appropriate learning rate. Thus, I'll close this one.