What is the objective when pretraining?
Young973 opened this issue · comments
TBH, I'm a little confused about what is the objective when pretraining with AST? It seems it is not indicated in the paper. BTW, when pretraining SSAST discriminative objective is the classification with InfoNCE and generative objective is reconstruction. But what is it in AST?
hi there,
It is just ImageNet pretraining.
I.e., using ImageNet pretrained DeiT as the initial weight for AST.
Lines 60 to 68 in 31088be
-Yuan
Some modification is needed. See https://github.com/YuanGongND/ast/blob/master/src/models/ast_models.py.
If you mean audio domain pretraining, that is just train AST on AudioSet (based on ImageNet initialization) with BCE loss for classification task. You can then take the model for other audio tasks (e.g., for ESC-50).