Clarification of speed

Question

Clarification of speed

zehongs opened this issue 2 months ago · comments

Hi, thanks for the great work!
I'm curious about how the 0.3s per image is calculated. Is this the overall throughput with a batch size of 256?
I noticed that the diffusion MLP is still taking quite a bit of time, while the MAE encoder and decoder transformers are relatively fast. To improve speed, would it be possible or recommended to further reduce the size of this MLP?

Tianhong Li · Answer 1 · Sat Sep 14 2024 21:10:46 GMT+0800 (China Standard Time)

Thanks for the interest! Yes, it is the overall throughput with 256 batch size. The problem with diffusion MLP is that it is too small to fully utilize the GPU, so reducing the MLP size will actually not help much (especially width). A large batch size is one way to alleviate this issue.

Zehong · Answer 2 · Sat Sep 14 2024 21:27:06 GMT+0800 (China Standard Time)

Thanks for the prompt reply!!
I'm also curious about the necessity of a MAGE-like encoder and decoder. Since only the MSE loss on the next set of tokens is used during training and no contrastive training like in MAGE is involved, is it still necessary to use such a masked encoding + decoding approach for the unmasked tokens? Any insight would be helpful!

Tianhong Li · Answer 3 · Sat Sep 14 2024 21:32:39 GMT+0800 (China Standard Time)

We use this kind of sparse encoder to save computation: in this way, the FLOPs in the encoder will be just 10% of that in the decoder (if we don't consider the buffer tokens). Using a single transformer (aka decoder) is also totally fine (similar to MaskGIT)

Xin Ma · Answer 4 · Sat Oct 05 2024 14:09:23 GMT+0800 (China Standard Time)

Hi, thanks for the great work! I'm curious about how the 0.3s per image is calculated. Is this the overall throughput with a batch size of 256? I noticed that the diffusion MLP is still taking quite a bit of time, while the MAE encoder and decoder transformers are relatively fast. To improve speed, would it be possible or recommended to further reduce the size of this MLP?

I have some doubts about the 0.3s per image as well. Does this refer to the time it takes for 256 tokens to go through the model once? Because I tested generating 8 images, which took 10.4 seconds, and the GPUs used are NVIDIA A100.

Tianhong Li · Answer 5 · Sat Oct 05 2024 21:24:00 GMT+0800 (China Standard Time)

@maxin-cn 0.3s per image is the time to generate 256 images in a batch divided by 256. Generating a small batch is typically much more inefficient due to the suboptimal utilization of the GPU.

Xin Ma · Answer 6 · Sat Oct 05 2024 21:36:34 GMT+0800 (China Standard Time)

@maxin-cn 0.3s per image is the time to generate 256 images in a batch divided by 256. Generating a small batch is typically much more inefficient due to the suboptimal utilization of the GPU.

Thanks~