YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loss curves

Tomsen1410 opened this issue · comments

commented

Hey, great work!
I just wanted to ask whether you might have the loss curves of your runs so that I can compare with my experiments a little bit?

Hi Tom,

Thanks for your interest.

I didn't plot the loss curve, but I think I still have (pre)-training logs on my server. Which experiment are you looking for?

-Yuan

commented

Thanks for the reply! I am looking for the pre-training run(s), i.e. the one trained on Librispeech + AudioSet.

commented

Bump!
Sorry to get back to the question. But I am currently trying to implement SSAST (well, its masked autoencoder counterpart, see https://arxiv.org/pdf/2203.16691.pdf) for music. For efficiency reasons, I tried to have the computed spectograms in FP16 format, but the reconstruction (generative) loss curve seems a bit weird. It first goes down really quickly, rises again to a given point and drops slowly again afterwards.

I just wanted to have some comparison in order to know what I should expect. Thanks in advance!

Hi there,

I think this is our log (gen&dis objective, 400 masking patches, full AS + Librispeech). Unfortunately, I don't think we logged the generation loss but just the discrimination loss. The columns are defined at

result.append([train_acc_meter.avg, train_nce_meter.avg, acc_eval, nce_eval, optimizer.param_groups[0]['lr']])

For your question

reconstruction (generative) loss curve seems a bit weird. It first goes down really quickly, rises again to a given point and drops slowly again afterwards.

Could it be possible due to you add L_g and L_d together? Otherwise, the L_g on the training set should always drop.

-Yuan