GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

Baorui Ma^1*, Haoge Deng^2,1*, Junsheng Zhou^3,1, Yu-Shen Liu³, Tiejun Huang^1,4, Xinlong Wang¹

¹BAAI, ²BUPT, ³THU, ⁴PKU
^* Equal Contribution

Paper | Project page

We present GeoDream, a 3D generation method that incorporates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity. Our numerical and visual comparisons demonstrate that GeoDream generates more 3D consistent textured meshes with high-resolution realistic renderings (i.e., 1024 × 1024) and adheres more closely to semantic coherence. To comprehensively evaluate semantic coherence, to our knowledge, we are the first to propose Uni3D-score metric, lifting the measurement from 2D to 3D. You can find detailed usage instructions for training GeoDream and evaluation code of 3D metric Uni3D-score below.

GeoDream alleviates the Janus problems by incorporating explicit 3D priors with 2D diffusion priors. GeoDream generates consistent multi-view rendered images and rich details textured meshes.

Qualitative comparison with baselines. Back views are highlighted with red rectangles for distinct observation of multiple faces.

News

[1/5/2024] Add support for GeoDream extension of ThreeStudio. The extension implementation can be found at the threestudio branch.

[12/20/2023] Add support for Stable-Zero123. Follow the instructions here to give it a try.

[12/2/2023] Code released.

Installation

git clone https://github.com/baaivision/GeoDream.git
cd GeoDream

Due to environmental conflicts between the pre-trained multi-view diffusion for predicting source views and the code for constructing cost volume, we currently have to use two separate virtual environments. This is inconvenient for researchers, and we are working hard to resolve the conflicts as part of our future update plans.

Install for predicting source views

See mv-diffusion/README.md for additional information for install.

Installation for constructing cost volume

See installation.md for additional information, including installation via Docker.

The following steps have been tested on Ubuntu20.04.

You must have an NVIDIA graphics card with at least 24GB VRAM and have CUDA installed.
Install Python >= 3.8.
(Optional, Recommended) Create a virtual environment:

python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip

Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.

# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:

pip install ninja

Install dependencies:

pip install -r requirements.txt
pip install inplace_abn
FORCE_CUDA=1 pip install --no-cache-dir git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0

If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1 before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following here and here, and set pretrained_model_name_or_path and pretrained_model_name_or_path_lora in configs/geodream-neus.yaml, configs/geodream-dmtet-geometry.yaml, and configs/geodream-dmtet-texture.yaml to the local path.

Quickstart

Two steps required for 3D generation are as follows.

Predict source views and construct cost volume (Choose one of the following two.)
- Driven by a given prompt
- Driven by a given reference view with a prompt
GeoDream Training

Predict source views and construct cost volume

Choose one of the two.

Predict source views and construct cost volume driven by a given reference view with a prompt

conda activate geodream_mv
cd ./mv-diffusion
sh run-volume-by-zero123.sh "An astronaut riding a horse" "ref_imges/demo.png"
# Defaulting to use Zero123. If the generated results are not satisfactory, consider using Stable Zero123.
sh run-volume-by-sd-zero123.sh "An astronaut riding a horse" "ref_imges/demo.png"
conda deactivate
cd ..

Predict source views and construct cost volume driven by a given prompt

conda activate geodream_mv
cd ./mv-diffusion
sh step1-run-mv.sh "An astronaut riding a horse"
conda deactivate
. venv/bin/activate
sh step2-run-volume.sh "An astronaut riding a horse"
cd ..

GeoDream Training

Rendered images obtained from Stage1

Rendered images obtained from Stage2+3

Note: we compress these motion pictures for faster previewing.

conda deactivate
. venv/bin/activate
# --------- Stage 1 (NeuS) --------- #
# object generation with 512x512 NeuS rendering, ~25GB VRAM
python launch.py --config configs/geodream-neus.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.geometry.init_volume_path="mv-diffusion/volume/An_astronaut_riding_a_horse/con_volume_lod_150.pth"
# if you don't have enough VRAM, try training with 64x64 NeuS rendering, ~15GB VRAM
python launch.py --config configs/geodream-neus.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.geometry.init_volume_path="data/con_volume_lod_150.pth" data.width=64 data.height=64
# using the same model for pretrained and LoRA enables 64x64 training with <10GB VRAM
# but the quality is worse due to the use of an epsilon prediction model for LoRA training
python launch.py --config configs/geodream-neus.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.geometry.init_volume_path="data/con_volume_lod_150.pth" data.width=64 data.height=64 system.guidance.pretrained_model_name_or_path_lora="stabilityai/stable-diffusion-2-1-base"

# --------- Stage 2 (DMTet Geometry Refinement) --------- #
# refine geometry
python launch.py --config configs/geodream-dmtet-geometry.yaml --train system.geometry_convert_from=path/to/stage1/trial/dir/ckpts/last.ckpt --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.renderer.context_type=cuda system.geometry_convert_override.isosurface_threshold=0.0

# --------- Stage 3 (DMTet Texturing) --------- #
# texturing with 1024x1024 rasterization, Stable Diffusion VSD guidance, ~20GB VRAM
python launch.py --config configs/geodream-dmtet-texture.yaml system.geometry.isosurface_resolution=256 --train data.batch_size=2 system.renderer.context_type=cuda --gpu 0 system.geometry_convert_from=path/to/stage2/trial/dir/ckpts/last.ckpt system.prompt_processor.prompt="an astronaut riding a horse"
# if you don't have enough VRAM, try training with batch_size=1, ~10GB VRAM
python launch.py --config configs/geodream-dmtet-texture.yaml system.geometry.isosurface_resolution=256 --train data.batch_size=1 system.renderer.context_type=cuda --gpu 0 system.geometry_convert_from=path/to/stage2/trial/dir/ckpts/last.ckpt system.prompt_processor.prompt="an astronaut riding a horse"

We also provide corresponding scripts for researchers to reference: neus-train.sh corresponds to stage 1, mesh-finetuning-geo.sh and mesh-finetuning-texture.sh corresponds to stage 2 and 3.

Resume from checkpoints

If you want to resume from a checkpoint, do:

# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20000
# you can also perform testing using resumed checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# note that the above commands use parsed configuration files from previous trials
# which will continue using the same trial directory
# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command

# only load weights from saved checkpoint but don't resume training (i.e. don't load optimizer state):
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 system.weights=path/to/trial/dir/ckpts/last.ckpt

Export Rendered Videos

python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt

Export Texture Meshes

Textured meshes obtained from Stage2+3

To export the scene to texture meshes, use the --export option. We currently support exporting to obj+mtl, or obj with vertex colors.

# this uses default mesh-exporter configurations which exports obj+mtl
python launch.py --config "path/to/stage3/trial/dir/configs/parsed.yaml"  --export --gpu 0 resume="path/to/stage3/trial/dir/ckpts/last.ckpt"  system.exporter_type=mesh-exporter system.exporter.context_type=cuda

Schedule

We are committed to open-sourcing GeoDream related materials, including:

Text-to-3D inference code
Export rendered videos
Textured meshes extraction
Text-to-3D with a single-view reference image
Evaluation code of Uni3D Score, lifting the semantic coherence measurement from 2D to 3D.
Release the checkpoints of our generated results to help researchers conduct fair comparisons.
Test prompts

Acknowledgement

GeoDream is built using the awesome open-source projects: Uni3D, ThreeStudio, MVDream, One2345,Zero123, Zero123++.

Thanks to the maintainers of these projects for their contribution to the community!

Citation

If you find GeoDream helpful, please consider citing:

@inproceedings{Ma2023GeoDream,
    title = {GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation},
    author = {Baorui Ma and Haoge Deng and Junsheng Zhou and Yu-Shen Liu and Tiejun Huang and Xinlong Wang},
    journal={arXiv preprint arXiv:2311.17971},
    year={2023}
}

baaivision / GeoDream