hengruo / QANet-pytorch

A PyTorch implementation of QANet.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance metrics

uditsaxena opened this issue · comments

Hey, could you please talk about the performance metrics of this pytorch implementation?

Thanks

Thanks for your attention! I just finished this model. Now it can get EM/F1 = 70.5/77.2 after 20 epochs. I will release more detail metrics soon.

Thanks, could you also talk about training time ?

Thanks for sharing ! We just started playing with your code, as part of a school project. The performance we get is horrible : EM/F1 = 0.02/3.94 after 10 epochs. It seems the model is not learning. We are digging in... Any idea that might help our cause? @uditsaxena our training time is a little over an hour per epoch, using an AWS p2.x large.

@Ramondy I got the same results after my destructive changes... I'm trying to reimplement it to save this model and make code cleaner.

I have same problem with Ramondy. Any idea that might help our cause?

I have same problem,EM: 0.025, F1: 4.1858

@hengruo May I ask that what is the current best performance you can get? I found a few things different from the paper:

  1. your learning rate is not fixed to 0.001 after 1000 steps. When I use the same learning rate change scheduler, I found the performance is growing but the performance changes quite slow after around some epochs.
  2. seems you didn't use exponential moving average for parameter weights. The decay 0.9999 you set is for the scheduler, which is actually gamma parameter.
  3. If I set the learning rate fixed to be 0.001 after 1000 epochs, my performance is not good and often changes dramatically during each epoch (I was mostly based on your implementation, but the multi-head attention is implemented in another manner as an isolated module in my implementation).

Looking forward to your reply!

commented

what is the performance now?

I played around with the configuration a little bit, but I am barely scratching F1 of 20 (even after >30.000 steps). I see that the standard configuration in config.py is different from the paper (e.g. number of heads in multihead attention = 2 instead of 8, batch_size, etc).

Could someone with better results explain the exact configurations he used? I think this would help others to get up to speed and start the real experimentation ;) Thanks!

@susht3 @fkaupmann now two guys got about 65.0 F1 after 25,000 iterations. I've updated the readme. If you would like to know the training details, you could see the other issue about memory explosion. They discussed it there.

I tried to train with the default parameter, but I only get very low F1/ EM after a long time, F1 is around 10 after training for a long time. Is there anything I need to pay attention to while training the model? Thanks!

I tried to train with the default parameter, but I only get very low F1/ EM after a long time, F1 is around 10 after training for a long time. Is there anything I need to pay attention to while training the model? Thanks!

Have you solved the problem?

Hello, what is the performance now? I got a F1/Em score below 10