galsang / BiDAF-pytorch

Re-implementation of BiDAF(Bidirectional Attention Flow for Machine Comprehension, Minjoon Seo et al., ICLR 2017) on PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

what's the function of file ema.py?

IrvingBei opened this issue · comments

Hi, thank you for implement of BiDAF in this clear way, I am a beginner of pytorch, so I am confused about what's the function of ema.py, one function I guess is saving the parameters which are trainable during training. And I don't understand the update method, Could you please why you use this in implement. Thank you again.
def update(self, name, x): assert name in self.shadow new_average = (1.0 - self.mu) * x + self.mu * self.shadow[name] self.shadow[name] = new_average.clone()

EMA means exponential moving average.
In the paragraph of model details, referred in chapter 4 of the BiDAF paper, you can find out this following comment:

During training, the moving averages of all weights of the model are maintained with the exponential decay rate of 0.999.

As Pytorch did not support this functionality at the time of implementation, I tried to build one on my own while relying on other open sources, even though I'm not sure this is correct.
For simplicity, you can just ignore it as I have empirically found that EMA does not have much effect on model performance.

oh, I see, thank you again.

commented

Thanks for your implementation. Just curious, is the function of ema.py the same as the rho parameter in Adadelta? here is the doc.