RenYurui / Global-Flow-Local-Attention

The source code for paper "Deep Image Spatial Transformation for Person Image Generation"

Home Page:https://renyurui.github.io/GFLA-web

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi GPU training?

PangzeCheung opened this issue · comments

I set the gpu_ids 2,3, but the program only runs on the GPU 2. Could you please tell me is the code support for multi GPU training? Thank you!

You can use torch.nn.DataParallel to train the model using multi-GPU. See here

Specifically, if you want to train the pose-guided person image generation task, you can modify the "__ init __" function in pose_model.py. Add

self.net_G = torch.nn.DataParallel(self.net_G, device_ids=self.gpu_ids)
self.net_D = torch.nn.DataParallel(self.net_D, device_ids=self.gpu_ids)

Currently, only the face animation model supports multi-GPU training.
We will update the code soon.
Thanks for asking.

@RenYurui Thank you very much!

Hi @RenYurui,

Nice work!
It seems like even after I use the DataParallel in pose_flownet using below command, the model still uses a single GPU.
self.net_G = torch.nn.DataParallel(self.net_G, device_ids=self.gpu_ids)

It seems like all data is only loaded on first GPU in your code as show below:

            self.input_P1 = input_P1.cuda(self.gpu_ids[0], async=True)
            self.input_BP1 = input_BP1.cuda(self.gpu_ids[0], async=True)
            self.input_P2 = input_P2.cuda(self.gpu_ids[0], async=True)
            self.input_BP2 = input_BP2.cuda(self.gpu_ids[0], async=True)  

I tried to replace above with just .cuda() but still I am not able to spread batch data across multiple GPU's and first GPU is running out of memory when I uses larger batch size. Is it the case that your custom built CUDA operations don't support multiple GPUs?

Thanks,
Bhavan