model and training code for the AR variant

Question

model and training code for the AR variant

MikeWangWZHL opened this issue 2 months ago · comments

Thanks for open-sourcing this amazing project!
I wonder it is possible to also release model and training code for the AR baseline

Thank you in advance!

Tianhong Li · Answer 1 · Wed Sep 11 2024 03:19:37 GMT+0800 (China Standard Time)

To keep this repo clean, we don't have a plan to release the AR code in this repo. However, it is very easy to reimplement it using the current repo -- almost all hyper-parameters remain the same as MAR. The only difference is the causal attention mask and the teacher-forcing loss.

shaochenze · Answer 2 · Fri Sep 20 2024 16:29:42 GMT+0800 (China Standard Time)

Hi @LTH14, in the AR variant, is it necessary for the attention mechanism within the MAE encoder to be causal? Alternatively, should we consider removing the MAE encoder altogether in this variant?

Tianhong Li · Answer 3 · Fri Sep 20 2024 22:27:34 GMT+0800 (China Standard Time)

In the AR variant, we don't need the MAE encoder. A single causal decoder is enough (similar to GPT).

shaochenze · Answer 4 · Sat Sep 21 2024 01:04:46 GMT+0800 (China Standard Time)

Thanks! Do you double the depth of MAE decoder?

Tianhong Li · Answer 5 · Sat Sep 21 2024 05:35:19 GMT+0800 (China Standard Time)

Yes we keep the total number of parameters unchanged