Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is autoregression possible?

zhaohm14 opened this issue · comments

Thanks for your wonderful work!
I am interested in applying autoregressive to achieve a length-flexible output.
Could this be implemented by changing the way the model infers, like the LLMs?

Thanks for your wonderful work! I am interested in applying autoregressive to achieve a length-flexible output. Could this be implemented by changing the way the model infers, like the LLMs?

Thanks for your interest. What inference algorithm of LLM are you referring to specifically?

I mean, generating subsequent frames using the previous frames as input (and perhaps adding a special end token?), instead of generating 16 frames at once. Thus we can accept training videos with any length, and generate longer and more length-flexible videos.

I mean, generating subsequent frames using the previous frames as input (and perhaps adding a special end token?), instead of generating 16 frames at once. Thus we can accept training videos with any length, and generate longer and more length-flexible videos.

Not sure about performance, since the model was trained directly on 16 frames of video. You can try it, and if there are better results, welcome PR.