YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I've been stuck on the first epoch in a tiny model for 8+ hours using 300 5s spectrograms for training. Is this normal?

michaelschwob opened this issue · comments

I appreciate your patience with my questions/bugs in my previous post. I assume the answer to this is much quicker; perhaps I'm specifying an argument incorrectly.

I am trying to run a tiny patch-based model on my laptop (cpu: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz; gpu: Intel UHD Graphics 630). I have been in the first epoch for 8+ hours now, which does not seem right. My run.sh file is below for reference. I have switched from a base to a tiny model, decreased the epoch iterations and batch sizes, and modified the mean, std, and target_length appropriately. Additionally, I have significantly reduced the amount of spectrograms in the training data set. The original ESC-50 has 1600 spectrograms; I am only using the first 180 to train.

run.sh:

#!/bin/bash
#SBATCH -p sm
#SBATCH -x sls-sm-1,sls-2080-[3],sls-1080-3,sls-sm-5
##SBATCH -p gpu
##SBATCH -x sls-titan-[0-2]
#SBATCH --gres=gpu:1
#SBATCH -c 4
#SBATCH -n 1
#SBATCH --mem=48000
#SBATCH --job-name="ssast_pretrain"
#SBATCH --output=./slurm_log/log_%j.txt

set -x
# comment this line if not running on sls cluster
#. /data/sls/scratch/share-201907/slstoolchainrc
source /data/sls/scratch/yuangong/sslast2/sslast2/bin/activate
export TORCH_HOME=../../pretrained_models
mkdir exp
mkdir slurm_log

task=pretrain_joint
mask_patch=400 # maybe reduce ??

# ESC-50
dataset=esc-50
tr_data=./src/prep_data/esc50/data/datafiles/esc_train_data_1_reduced.json
te_data=./src/prep_data/esc50/data/datafiles/esc_eval_data_1.json
dataset_mean=3.693319320678711
dataset_std=64.5123519897461
target_length=50 # for 5 seconds
num_mel_bins=128

model_size=tiny
# no patch split overlap
fshape=16
tshape=16
fstride=${fshape}
tstride=${tshape}
# no class balancing as it implicitly uses label information
bal=none
batch_size=10 # was 24
lr=1e-4
# learning rate decreases if the pretext task performance does not improve on the validation set
lr_patience=22
epoch=10
# no spectrogram masking
freqm=0
timem=0
# no mixup training
mixup=0

exp_dir=./exp/mask01-${model_size}-f${fshape}-t${tshape}-b$batch_size-lr${lr}-m${mask_patch}-${task}-${dataset}

## be sure to modify the label-csv file directory!!
CUDA_CACHE_DISABLE=1 python -W ignore ../run.py --dataset ${dataset} \
--data-train ${tr_data} --data-val ${te_data} --exp-dir $exp_dir \
--label-csv ./src/prep_data/esc50/esc_class_labels_indices.csv \
--lr $lr --n-epochs ${epoch} --batch-size $batch_size --save_model False \
--freqm $freqm --timem $timem --mixup ${mixup} --bal ${bal} \
--tstride $tstride --fstride $fstride --fshape ${fshape} --tshape ${tshape} \
--dataset_mean ${dataset_mean} --dataset_std ${dataset_std} --target_length ${target_length} --num_mel_bins ${num_mel_bins} \
--model_size ${model_size} --mask_patch ${mask_patch} --n-print-steps 100 \
--task ${task} --lr_patience ${lr_patience} --epoch_iter 50 # was 4000

In my terminal, I've been stuck on the following output:

Creating experiment directory: ./exp/mask01-tiny-f16-t16-b10-lr1e-4-m400-pretrain_joint-esc-50
Now starting self-supervised pretraining for 10 epochs
Now running on : cpu
Total parameter number is : 5.952594000 million
Total trainable parameter number is : 5.952592000 million
current #steps=0, #epochs=1
start training...
2024-02-11 09:09:01.792826
Another data point.
warm-up learning rate is 0.000000

I apologize for asking several questions in the span of a week and appreciate your commitment to open science. I'm hoping that this is my last post for you!

I think I see the issue. In ast_models.py, in the pretrain_stage==True loop, we compute

self.p_f_dim, self.p_t_dim = self.get_shape(fstride, tstride, input_fdim, input_tdim, fshape, tshape)
num_patches = self.p_f_dim * self.p_t_dim
self.num_patches = num_patches

which results in self.num_patches=24, which is then passed to gen_maskid_patch() as sequence_len. Because mask_size=400 in run.sh, the following loop is infinite:

while len(list(set(mask_id))) <= mask_size:
            start_id = randrange(sequence_len)

            # this improves the efficiency, but might change the pretrained model
            # while start_id in mask_id:
            #     start_id = randrange(sequence_len)

            cur_mask = []
            for i in range(0, cur_clus):
                for j in range(0, cur_clus):
                    mask_cand = start_id + self.p_t_dim * i + j
                    if mask_cand > 0 and mask_cand < sequence_len:
                        cur_mask.append(mask_cand)
            mask_id = mask_id + cur_mask

So, is the solution to simply reduce mask_patch in run.sh file? What exactly are the consequences in reducing the mask_patch to (for example) 10?

please check the paper. The masked patch should be something 50%-80% of the total #patch.

You are running it on CPU (Intel UHD and AMD GPUs do not support cuda). So it will be slow anyways. Maybe training on colab (GPU) is a better idea, even with the free plan.

-Yuan

This answers my question; thank you so much for your help, Yuan! I suggest mentioning that mask_patch should also be modified in the src/pretrain/run_mask_{frame,patch}.sh files. At the moment, in your "Pretrain on custom dataset" section, you do not include this as a parameter to be modified. Additionally, we would need to modify the --n_class and --label-csv arguments in the terminal command.