Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to place and preprocess these datasets

renyuanzhe opened this issue · comments

FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

You can refer to #35. If it's still not clear, I'll give you a data structure.

FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

You can refer to #35. If it's still not clear, I'll give you a data structure.

should i write the preprocess code by myself, or is your repository contains the needed code?

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

All datasets have this dataset structure following their original structure, and no additional operations are required.

ROOT:
├── train
    ├── video1.mp4
    ├── video2.mp4
├── test
    ├── video1.mp4
    ├── video2.mp4

or

ROOT:
├── train
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
├── test
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

All datasets have this dataset structure following their original structure, and no additional operations are required.

ROOT:
├── train
    ├── video1.mp4
    ├── video2.mp4
├── test
    ├── video1.mp4
    ├── video2.mp4

or

ROOT:
├── train
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
├── test
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png

thankyou, I have preprocessed the dataset.however I find that the input img size is 32 inthe code,this is different with 256 in the paper. Is there something wrong?

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

All datasets have this dataset structure following their original structure, and no additional operations are required.

ROOT:
├── train
    ├── video1.mp4
    ├── video2.mp4
├── test
    ├── video1.mp4
    ├── video2.mp4

or

ROOT:
├── train
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
├── test
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png

thankyou, I have preprocessed the dataset.however I find that the input img size is 32 inthe code,this is different with 256 in the paper. Is there something wrong?

The encoder will downsample video from 256 to 32.