A question about ESC's AC

Question

A question about ESC's AC

liyunlongaaa opened this issue 2 years ago · comments

YangGaoBin commented 2 years ago

学长您好，在论文中的ESC的指标为什么是80多，在AST那篇文章不是都作到95了嘛

Why the ESC accuracy score is ~80% in the SSAST paper while ~95% was reported in the AST paper?

Yuan Gong · Answer 1 · Thu Jul 28 2022 10:40:52 GMT+0800 (China Standard Time)

Hi there,

I think the main difference between the setting is if supervised AudioSet pretraining is applied. As AudioSet and ESC-50 are very close datasets and even share some classes, supervised AudioSet pretraining usually makes a big difference.

More specifically,

In the AST paper
ImageNet supervised pretraining = 88.7
ImageNet supervised pretraining + AudioSet supervised pretraining = 94.7

In the SSAST paper
AudioSet + Librispeech self-supervised pretraining = 88.8

Hope this helps.

-Yuan

YangGaoBin · Answer 2 · Thu Jul 28 2022 10:43:54 GMT+0800 (China Standard Time)

Thank you a lot!