jjihwan / SV3D-fine-tune

Fine-tuning code for SV3D

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SV3D fine-tuning

Fine-tuning code for SV3D

Input Image Before Training After Training
input image Image 2 Image 3

Setting up

PyTorch 2.0

conda create -n sv3d python==3.10.14
conda activate sv3d
pip3 install -r requirements.txt

Install deepspeed for training

pip3 install deepspeed

Get checkpoints πŸ’Ύ

Store them as following structure:

cd SV3D-fine-tuning
    .
    └── checkpoints
        └── sv3d_p.safetensors

Dataset πŸ“€

Prepare dataset as following. We use Objaverse 1.0 dataset with preprocessing pipeline. See objaverse dataloader for detail. orbit_frame_0020.png is input image, and video_latent.pt is the video latent encoded by SV3D encoder, without regularization (i.e. channel is 8)

cd dataset
    .
    └── 000-000
    |   └── orbit_frame_0020.png # input image
    |   └── video_latent.pt # video latent
    └── 000-001
    |   └── orbit_frame_0020.png
    |   └── video_latent.pt
    └── ...

Training πŸš€

I used a single A6000 GPU(VRAM 48GB) to fine-tune.

sh scripts/sv3d_finetune.sh

Inference ❄️

Store the input images in assets

sh scripts/inference.sh

Notes

  • The encoder weights of the vae are not provided in sv3d_p.safetensors.
    • To obtain the video latents, you should run the encoder separately and use them in the training pipeline, which is due to saving the time and the GPU VRAM for training.
    • Note that you should use the output of the encoder of the vae, not the sample from the distribution defined by the mean and variance of the encoder. In our case, we used AutoencoderKLTemporalDecoder which is the same vae used in the SVD pipeline.

Acknowledgement πŸ€—

The source code is based on SV3D. Thanks for the wonderful codebase!

Additionally, GPU and NFS resources for training are supported by fal.aiπŸ”₯.

Feel free to refer to the fal Research Grants!

About

Fine-tuning code for SV3D

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%