XiaLiPKU / EMANet

And how to consider the gradient backpropgation in your implement?

For the first question:
I implemented it in the 'train.py'

Line 134 in 9a492d8

self.net.module.ema.mu *= momentum

. Implement it in the EMAU module may be more good-looking. But as the \mu has to be averaged on the whole batch, implementing it in the module needs the 'reduce' operation as in SyncBN. So I just write the line in the 'train.py', where the \mu from all GPUs are already together here.

And how to consider the gradient backpropgation in your implement?

For the second question:

I simple cut off the gradients for the A_E and A_M iterations as

EMANet/network.py

Line 227 in 9a492d8

with torch.no_grad():

.
To be honest, there lacks deep exploration of what happens inside the EMA. So EMANet is just a naive exploration on the EM + Attention mechanism. So, I just look forward for more deep analysis by dear followers.

How do you implement the equation(15) in your paper?