Is EMA used in this work?

Question

Is EMA used in this work?

JacobYuan7 opened this issue 3 years ago · comments

Hello author, thanks for your great work. I raise a question about the usage of Exponential Moving Average (EMA) in this paper, hoping you can provide me with some clues. It seems that this paper does not detail in this part. As far as I know, MDETR uses it and evaluate use the EMA model. So I wonder is it used in this work? If it is actually used, why should we evaluate by the EMA model rather than the original one?

Muhammad Maaz · Answer 1 · Thu Jan 06 2022 17:46:32 GMT+0800 (China Standard Time)

Hi @JacobYuan7,

Thank you for your interest in this work. Similar to MDETR, MDef-DETR also uses EMA during training and for evaluating the weights are loaded from the original model. You can try using the ema model for testing by loading the weights from checkpoint["model_ema"] instead of checkpoint["model"], and it should give almost the same results. Let me know if you have any questions.

Hangjie Yuan · Answer 2 · Fri Jan 07 2022 15:37:07 GMT+0800 (China Standard Time)

Hi @JacobYuan7,

Thank you for your interest in this work. Similar to MDETR, MDef-DETR also uses EMA during training and for evaluating the weights are loaded from the original model. You can try using the ema model for testing by loading the weights from checkpoint["model_ema"] instead of checkpoint["model"], and it should give almost the same results. Let me know if you have any questions.

As I understand it, MDETR uses 'model_ema' to evaluate the model, which is shown in:
https://github.com/ashkamath/mdetr/blob/bf09d98b0b41cd615185dcb0082299a5ba24c319/scripts/eval_lvis.py#L101
Correct me if I am wrong, many thanks!

BTW, the training of the language model follows MDETR, right? With a warmup schedule and then decrease linearly back to zero for the rest of the training.

Muhammad Maaz · Answer 3 · Sun Jan 09 2022 21:31:01 GMT+0800 (China Standard Time)

As I understand it, MDETR uses 'model_ema' to evaluate the model, which is shown in: https://github.com/ashkamath/mdetr/blob/bf09d98b0b41cd615185dcb0082299a5ba24c319/scripts/eval_lvis.py#L101 Correct me if I am wrong, many thanks!

Hi, my apologies for the delayed reply. Yes, your understanding is correct. MDETR is using model_ema for evaluation during training and using model for inference (hubconf.py). However, I think using model_ema as well for inference would be more appropriate.

BTW, the training of the language model follows MDETR, right? With a warmup schedule and then decrease linearly back to zero for the rest of the training.

Yes, this is the case. Further, we are planning to release the training scripts by the end of this month. Stay tuned!

Hangjie Yuan · Answer 4 · Sun Jan 30 2022 15:15:50 GMT+0800 (China Standard Time)

As I understand it, MDETR uses 'model_ema' to evaluate the model, which is shown in: https://github.com/ashkamath/mdetr/blob/bf09d98b0b41cd615185dcb0082299a5ba24c319/scripts/eval_lvis.py#L101 Correct me if I am wrong, many thanks!

Hi, my apologies for the delayed reply. Yes, your understanding is correct. MDETR is using model_ema for evaluation during training and using model for inference (hubconf.py). However, I think using model_ema as well for inference would be more appropriate.

BTW, the training of the language model follows MDETR, right? With a warmup schedule and then decrease linearly back to zero for the rest of the training.

Yes, this is the case. Further, we are planning to release the training scripts by the end of this month. Stay tuned!

Sure, I will! Thx so much for your kind response.