b04901014 / FT-w2v2-ser

Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How many GPUs is enough?

Gpwner opened this issue · comments

Wondering how many Gpus you used, I used 2 v100s (16GB) and still couldn't run the last phase of the code until I reduced batch-size to 48 .I've made sure I use both gpus by modifying the following code:

        trainer = Trainer(
            precision=args.precision,
            amp_backend='native',
            callbacks=[checkpoint_callback] if hasattr(model, 'valid_met') else None,
            checkpoint_callback=hasattr(model, 'valid_met'),
            resume_from_checkpoint=None,
            check_val_every_n_epoch=1,
            max_epochs=hparams.max_epochs,
            num_sanity_val_steps=2 if hasattr(model, 'valid_met') else 0,
            gpus=-1,
            strategy='dp',  # multiple-gpus, 1 machine
            logger=False
        )

You can set the self.wav2vec2.encoder.config.gradient_checkpointing = True in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py#L103
This will greatly reduce the required VRAM if you are using a single GPU. (But it cannot scale to multiple GPUs if using DDP due to the algorithm is not compatible with gradient checkpointing.)
Also, if you limit the maxseqlen argument, it will truncate any training examples that is longer than that number (in seconds), which also reduces the VRAM usage as the padding is done, but this may influence the performance of the model (I don't think it will hurt a lot).

I use only one Quadro RTX 8000 (48 GB) for the last phase.

You can set the self.wav2vec2.encoder.config.gradient_checkpointing = True in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py#L103 This will greatly reduce the required VRAM if you are using a single GPU. (But it cannot scale to multiple GPUs if using DDP due to the algorithm is not compatible with gradient checkpointing.) Also, if you limit the maxseqlen argument, it will truncate any training examples that is longer than that number (in seconds), which also reduces the VRAM usage as the padding is done, but this may influence the performance of the model (I don't think it will hurt a lot).

I use only one Quadro RTX 8000 (48 GB) for the last phase.

Will this change result in an eventual performance degradation?

And I don't quite understand why you compute the confusion matrix over the entire data set:
https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93

I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93

I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

It seem to be you add all the experimen’s confusion matrixs to one, am I right?

Gradient checkpointing is a way to reduce memory cost by trading an additional forward pass, detailed in https://github.com/cybertronai/gradient-checkpointing
It should not impact the performance of the model.

For the confusion matrix. model.dataset.emoset is just a list of label strings as shown in here: https://github.com/b04901014/FT-w2v2-ser/blob/main/downstream/Custom/dataloader.py#L17
If you print it out, it should be something like ['anger', 'sad', 'neutral', 'sad'] if that's all the emotions in the training set.

All of the information is already in the confusion matrix itself, the function need that list since it needs to know what row/column corresponds to what emotion.

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93
I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

It seem to be you add all the experimen’s confusion matrixs to one, am I right?

Yes, the final confusion matrix is the sum of all runs/folds. But the stats (mean/std of UAR, F1) will contain more details about individual runs.

I see, Thank you,I will close this issue.