How to train on multi-GPU?
shuangliumax opened this issue · comments
Hello, because my hardware is limited, there is not enough memory to test on the domain net dataset, so parallel multi-GPU training is required, but it cannot be done according to the code
algorithm = torch.nn.DataParallel(algorithm,
device_ids=range(torch.cuda.device_count()))``
Do you have any good suggestions?
thanks!
Most simply, you can reduce the batch size for the DomainNet (e.g. use B=16 or B=24). This will affect the performance, but I think it is not that significant. Or, if you want to use DataParallel
, it seems to pull the model update code from the algorithm
.
Thank you for your reply. In fact, I wrote my own method in the algorithm.py. If I use batch_size=16 or 24, would it be unfair compared to the other methods? Or, I need to retest all methods at batch_size=16 or 24, which may be time-consuming and costly. Therefore, in order to alleviate the insufficient memory of a single GPU, I wanted to solve the problem with multiple Gpus, but I tried some methods, but always failed.
I think you need to separate model update code (including loss backward and optimizer step) from the algorithm
, to use DataParallel
. Since the codebase is not designed for the multi-gpu originally, there may need several additional modifications.
I think you need to separate model update code (including loss backward and optimizer step) from the
algorithm
, to useDataParallel
. Since the codebase is not designed for the multi-gpu originally, there may need several additional modifications.
Ok, thank you. I'll try.