what's the function of file ema.py?

Question

what's the function of file ema.py?

IrvingBei opened this issue 5 years ago · comments

Hi, thank you for implement of BiDAF in this clear way, I am a beginner of pytorch, so I am confused about what's the function of ema.py, one function I guess is saving the parameters which are trainable during training. And I don't understand the update method, Could you please why you use this in implement. Thank you again.
def update(self, name, x): assert name in self.shadow new_average = (1.0 - self.mu) * x + self.mu * self.shadow[name] self.shadow[name] = new_average.clone()

Taeuk Kim · Answer 1 · Thu Mar 28 2019 10:46:12 GMT+0800 (China Standard Time)

EMA means exponential moving average.
In the paragraph of model details, referred in chapter 4 of the BiDAF paper, you can find out this following comment:

During training, the moving averages of all weights of the model are maintained with the exponential decay rate of 0.999.

As Pytorch did not support this functionality at the time of implementation, I tried to build one on my own while relying on other open sources, even though I'm not sure this is correct.
For simplicity, you can just ignore it as I have empirically found that EMA does not have much effect on model performance.

IrvingBei · Answer 2 · Thu Mar 28 2019 11:31:41 GMT+0800 (China Standard Time)

oh, I see, thank you again.

Zhiqi · Answer 3 · Sun Mar 29 2020 14:30:01 GMT+0800 (China Standard Time)

Thanks for your implementation. Just curious, is the function of ema.py the same as the rho parameter in Adadelta? here is the doc.