YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about ESC's AC

liyunlongaaa opened this issue · comments

学长您好,在论文中的ESC的指标为什么是80多,在AST那篇文章不是都作到95了嘛

Why the ESC accuracy score is ~80% in the SSAST paper while ~95% was reported in the AST paper?

Hi there,

I think the main difference between the setting is if supervised AudioSet pretraining is applied. As AudioSet and ESC-50 are very close datasets and even share some classes, supervised AudioSet pretraining usually makes a big difference.

More specifically,

In the AST paper
ImageNet supervised pretraining = 88.7
ImageNet supervised pretraining + AudioSet supervised pretraining = 94.7

In the SSAST paper
AudioSet + Librispeech self-supervised pretraining = 88.8

Hope this helps.

-Yuan

Thank you a lot!