google / compare_gan

Compare GAN code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributed training

jppgks opened this issue · comments

Thanks for open sourcing the code for this awesome paper!

I’m wondering if you used distributed training of the different GAN models during experimentation. If so, could you share an example of how to launch a distributed training job using compare_gan code?

commented

Hi Joppe,

the training of a single GAN is done on a single GPU (it's relatively fast for the architecture and datasets that we used).

We launched multiple experiments in parallel - first by running compare_gan_generate_tasks to create a set of experiment to run, then by running compare_gan_run_one_task on many machines (machine 0 with task_num=0, machine 1 with task_num=1, etc)

@jppgks comparing multiple experiments in parallel is nothing like distributed training, unless hyper-parameter optimization is the end goal. Is this what you mean by multiple tasks?

Note: We have updated the framework in the meantime and it now supports distributed training (single run on multiple machines) for TPUs.

@Marvin182 where can I find this in the code?