3d aigc diffusion-models generative-model multiview reconstruction

Convolutional Reconstruction Model

Official implementation for CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.

CRM is a feed-forward model which can generate 3D textured mesh in 10 seconds.

Project Page | Arxiv | HF-Demo | Weights

teaser.mp4

Try CRM 🍻

Try CRM at Huggingface Demo.
Try CRM at Replicate Demo. Thanks @camenduru!

Install

Step 1 - Base

Install package one by one, we use python 3.9

pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install torch-scatter==2.1.1 -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install kaolin==0.14.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-1.13.1_cu117.html
pip install -r requirements.txt

besides, one by one need to install xformers manually according to the official doc (conda no need), e.g.

pip install ninja
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Step 2 - Nvdiffrast

Install nvdiffrast according to the official doc, e.g.

pip install git+https://github.com/NVlabs/nvdiffrast

Inference

We suggest gradio for a visualized inference.

gradio app.py

For inference in command lines, simply run

CUDA_VISIBLE_DEVICES="0" python run.py --inputdir "examples/kunkun.webp"

It will output the preprocessed image, generated 6-view images and CCMs and a 3D model in obj format.

Tips: (1) If the result is unsatisfatory, please check whether the input image is correctly pre-processed into a grey background. Otherwise the results will be unpredictable. (2) Different from the Huggingface Demo, this official implementation uses UV texture instead of vertex color. It has better texture than the online demo but longer generating time owing to the UV texturing.

Train

We provide training script for multivew generation and their data requirements. To launch a simple one instance overfit training of multivew gen:

accelerate launch $accelerate_args train.py --config configs/nf7_v3_SNR_rd_size_stroke_train.yaml \
    config.batch_size=1 \
    config.eval_interval=100

To launch a simple one instance overfit training of CCM gen:

accelerate launch $accelerate_args train_stage2.py --config configs/stage2-v2-snr_train.yaml \
    config.batch_size=1 \
    config.eval_interval=100

data prepare

To specify the data dir modify the following params in the configs/xxxx.yaml

    base_dir: <path to multiview piexl image basedir>
    xyz_base: <path to related CCM image basedir>
    caption_csv: <path to caption.csv>

The file tree of basedirs should satisfy as following:

base_dir
├── uid1
│   ├── 000.png
│   ├── 001.png
│   ├── 002.png
│   ├── 003.png
│   ├── 004.png
│   ├── 005.png
├── uid2
....

xyz_base
├── uid1
│   ├── xyz_new_000.png
│   ├── xyz_new_001.png
│   ├── xyz_new_002.png
│   ├── xyz_new_003.png
│   ├── xyz_new_004.png
│   └── xyz_new_005.png
├── uid2
....

The train_example dir shows a minimal case of train data and caption.csv file.

Todo List

Release inference code.
Release pretrained models.
Optimize inference code to fit in low memery GPU.
Upload training code.

Acknowledgement

Citation

@article{wang2024crm,
  title={CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model},
  author={Zhengyi Wang and Yikai Wang and Yifei Chen and Chendong Xiang and Shuo Chen and Dajiang Yu and Chongxuan Li and Hang Su and Jun Zhu},
  journal={arXiv preprint arXiv:2403.05034},
  year={2024}
}

About

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.

https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/

3d aigc diffusion-models generative-model multiview reconstruction

MIT License

Languages

Language:Python 99.8%Language:Shell 0.2%