Streaming intermediate images?

Question

Streaming intermediate images?

sabetAI opened this issue 2 years ago · comments

Is it possible to publish an update of the model that supports streaming intermediate images during reverse diffusion ie with an iterator? Would greatly help UX if the user can see their image form while they're waiting for the process to finish.

Brett Kuprel · Answer 1 · Mon Jul 04 2022 10:58:19 GMT+0800 (China Standard Time)

This isn't a diffusion model so that wouldn't work

Ali Sabet · Answer 2 · Mon Jul 04 2022 12:23:05 GMT+0800 (China Standard Time)

Diffusion models iteratively update the image over multiple steps. These iterates can be streamed out (ie see glide demo). 'Reverse diffusion' is simply the image generation step ('diffusion' is the noising process during training), which is what your model is doing during inference. Can you update the code to output intermediate images?

Ali Sabet · Answer 3 · Mon Jul 04 2022 12:29:08 GMT+0800 (China Standard Time)

Using the term 'reverse diffusion' might have caused some confusion with what I was asking.

iScriptLex · Answer 4 · Mon Jul 04 2022 17:40:24 GMT+0800 (China Standard Time)

This model is not like glide or VQGAN+CLIP.
DALL-E works on a entirely different principle. The image is generated with tiny squares (tokens), square by square, from left to right and top to bottom. It does not change all image at once at every iteration like diffuse models. Every iteration it just fills another tiny bit of the empty area with the completely ready tiny portion of the final image.

Ali Sabet · Answer 5 · Mon Jul 04 2022 18:07:32 GMT+0800 (China Standard Time)

Ah good point @iScriptLex , I made assumptions about the model architecture. Even if it's outputting autoregressively, tokens can still be streamed out to incrementally update a canvas a pixel at a time. The main use-case here is so to show intermediate results to the user, as waiting kills the UX.

Brett Kuprel · Answer 6 · Mon Jul 04 2022 18:42:28 GMT+0800 (China Standard Time)

It might be possible to generate the images each time a row of tokens is decoded, and use some kind of blank token for the missing rows

Ali Sabet · Answer 7 · Mon Jul 04 2022 22:25:32 GMT+0800 (China Standard Time)

@kuprel yes exactly. Also would it be more efficient just to stream rows of tokens and have the client handle everything else? Want to minimize latency that streaming may add.

mdmnko · Answer 8 · Mon Jul 04 2022 23:37:01 GMT+0800 (China Standard Time)

This model is not like glide or VQGAN+CLIP. DALL-E works on a entirely different principle. The image is generated with tiny squares (tokens), square by square, from left to right and top to bottom. It does not change all image at once at every iteration like diffuse models. Every iteration it just fills another tiny bit of the empty area with the completely ready tiny portion of the final image.

this would still look cool while it was loading but i worry about latency and bandwidth, wouldn't a loading bar or something work just as well?

Ali Sabet · Answer 9 · Tue Jul 05 2022 01:31:30 GMT+0800 (China Standard Time)

@w4ffl35 can you quantify marginal latency/bandwidth costs? Loaders may work for one-time uses, but users will churn if they're stuck looking at loaders 95% of the time. See urzas.ai for example of UX with intermediate outputs. Imo if a flag was made available it would be hugely valuable for devs.

mdmnko · Answer 10 · Tue Jul 05 2022 01:39:00 GMT+0800 (China Standard Time)

@sabetAI those are great points

Brett Kuprel · Answer 11 · Tue Jul 05 2022 04:08:59 GMT+0800 (China Standard Time)

Ok I got it working in the colab now I just have to figure out how to get it on replicate. An intermediate image count of 8 only adds a couple seconds to the overall decoding time on the P100

Brett Kuprel · Answer 12 · Tue Jul 05 2022 04:21:26 GMT+0800 (China Standard Time)

Here's what it looks like (open in new tab to see animation)

Ali Sabet · Answer 13 · Tue Jul 05 2022 06:52:38 GMT+0800 (China Standard Time)

@kuprel so good 👏. When can you merge 🙏?

Brett Kuprel · Answer 14 · Tue Jul 05 2022 07:09:47 GMT+0800 (China Standard Time)

I merged it. You can try it in the colab. Hopefully will get it onto replicate by tomorrow

Brett Kuprel · Answer 15 · Tue Jul 05 2022 21:03:55 GMT+0800 (China Standard Time)

Ok it's live on replicate now