afiaka87 / clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi GPU Support

rlallen-nps opened this issue · comments

Any thoughts on building multi GPU support via dataparallel?

@rlallen-nps sounds interesting; unfortunately I don't have access to the compute so am a bit lacking in motivation to get it working.

I'm not entirely certain how multi-GPU inference would work with this code base. It seems like a lot of work when there's presently a device argument which should allow you to run inference on multiple GPUs; just toward different generations.

Check out datacrunch.io for cheap GPUs. The point of distributing one run over multiple GPUs is not to process more images, it's to process one generation at a much higher resolution.

Check out datacrunch.io for cheap GPUs.

I'm aware but like I said; not super motivated by this. If I were to rent one it would be to max out settings for a single GPU - easily achievable on an RTX3090 or an A100 but I don't have any intention at this time of spending money on this project. I do all the testing from my RTX 2070 that I don't have to pay rent for fortunately.

The point of distributing one run over multiple GPUs is not to process more images, it's to process one generation at a much higher resolution.

The guided-diffusion checkpoints have harsh size constraints and must be trained from scratch for different sizes. The largest size is the 512 pixel checkpoint (which Katherine Crowson finetuned to be unconditioned).

The code for guided-diffusion (from OpenAI's fork) uses MPI for distributed I think? If you wanted to increase the resolution of the generations though, that's where I would go.

I believe https://github.com/AranKomat/Diff-DALLE is also looking into training guided-diffusion on a transformer in the style of DALLE; there will definitely be interesting developments from that repository in the coming months and I fully expect it to surpass this method and for a few good checkpoints to be released as well.

The guided-diffusion checkpoints have harsh size constraints and must be trained from scratch for different sizes. The largest size is the 512 pixel checkpoint (which Katherine Crowson finetuned to be unconditioned).

Ah, I see; I was still thinking in VQGAN mode. I'll let you know if I find anything interesting and thanks for the Diff-DALLE recommendation!