inferless / stable-video-diffusion

(SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, finetuned from SVD Image-to-Video [14 frames]. We also finetune the widely used f8-decoder for temporal consistency.

Home Page:https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

inferless/stable-video-diffusion Stargazers