How many GPUs is enough?

Question

How many GPUs is enough?

Gpwner opened this issue 3 years ago · comments

Wondering how many Gpus you used, I used 2 v100s (16GB) and still couldn't run the last phase of the code until I reduced batch-size to 48 .I've made sure I use both gpus by modifying the following code:

        trainer = Trainer(
            precision=args.precision,
            amp_backend='native',
            callbacks=[checkpoint_callback] if hasattr(model, 'valid_met') else None,
            checkpoint_callback=hasattr(model, 'valid_met'),
            resume_from_checkpoint=None,
            check_val_every_n_epoch=1,
            max_epochs=hparams.max_epochs,
            num_sanity_val_steps=2 if hasattr(model, 'valid_met') else 0,
            gpus=-1,
            strategy='dp',  # multiple-gpus, 1 machine
            logger=False
        )

Li-Wei Chen · Answer 1 · Wed Nov 24 2021 11:17:29 GMT+0800 (China Standard Time)

You can set the self.wav2vec2.encoder.config.gradient_checkpointing = True in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py#L103
This will greatly reduce the required VRAM if you are using a single GPU. (But it cannot scale to multiple GPUs if using DDP due to the algorithm is not compatible with gradient checkpointing.)
Also, if you limit the maxseqlen argument, it will truncate any training examples that is longer than that number (in seconds), which also reduces the VRAM usage as the padding is done, but this may influence the performance of the model (I don't think it will hurt a lot).

I use only one Quadro RTX 8000 (48 GB) for the last phase.

Gpwner · Answer 2 · Wed Nov 24 2021 11:25:29 GMT+0800 (China Standard Time)

You can set the self.wav2vec2.encoder.config.gradient_checkpointing = True in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py#L103 This will greatly reduce the required VRAM if you are using a single GPU. (But it cannot scale to multiple GPUs if using DDP due to the algorithm is not compatible with gradient checkpointing.) Also, if you limit the maxseqlen argument, it will truncate any training examples that is longer than that number (in seconds), which also reduces the VRAM usage as the padding is done, but this may influence the performance of the model (I don't think it will hurt a lot).

I use only one Quadro RTX 8000 (48 GB) for the last phase.

Will this change result in an eventual performance degradation?

Gpwner · Answer 3 · Wed Nov 24 2021 11:27:45 GMT+0800 (China Standard Time)

And I don't quite understand why you compute the confusion matrix over the entire data set:
https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93

I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

Gpwner · Answer 4 · Wed Nov 24 2021 11:39:01 GMT+0800 (China Standard Time)

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93

I think it should be:
WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

It seem to be you add all the experimen’s confusion matrixs to one, am I right？

Li-Wei Chen · Answer 5 · Wed Nov 24 2021 11:39:28 GMT+0800 (China Standard Time)

Gradient checkpointing is a way to reduce memory cost by trading an additional forward pass, detailed in https://github.com/cybertronai/gradient-checkpointing
It should not impact the performance of the model.

For the confusion matrix. model.dataset.emoset is just a list of label strings as shown in here: https://github.com/b04901014/FT-w2v2-ser/blob/main/downstream/Custom/dataloader.py#L17
If you print it out, it should be something like ['anger', 'sad', 'neutral', 'sad'] if that's all the emotions in the training set.

All of the information is already in the confusion matrix itself, the function need that list since it needs to know what row/column corresponds to what emotion.

Li-Wei Chen · Answer 6 · Wed Nov 24 2021 11:46:55 GMT+0800 (China Standard Time)

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93
I think it should be:
WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)
It seem to be you add all the experimen’s confusion matrixs to one, am I right？

Yes, the final confusion matrix is the sum of all runs/folds. But the stats (mean/std of UAR, F1) will contain more details about individual runs.

Gpwner · Answer 7 · Wed Nov 24 2021 11:46:56 GMT+0800 (China Standard Time)

I see, Thank you,I will close this issue.