lianNice / Marigold

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Home Page:https://marigoldmonodepth.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

This repository represents the official implementation of the paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation".

Website Paper Open In Colab Hugging Face Space Hugging Face Model License

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler

We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.

teaser

๐Ÿ“ข News

2023-12-08: Added - try it out with your images for free!
2023-12-05: Added - dive deeper into our inference pipeline!
2023-12-04: Added paper and inference code (this repository).

Usage

We offer a number of way to interact with Marigold:

  1. A free online interactive demo is available here: (kudos to the HF team for the GPU grant)

  2. Run the demo locally (requires a GPU and an nvidia-docker2, see Installation Guide): docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all registry.hf.space/toshas-marigold:latest python app.py

  3. Extended demo on a Google Colab:

  4. If you just want to just see the examples, visit our gallery:

  5. Finally, local development instructions are given below.

๐Ÿ› ๏ธ Setup

This code has been tested on:

  • Python 3.10.12, PyTorch 2.0.1, CUDA 11.7, GeForce RTX 3090
  • Python 3.10.4, Pytorch 2.0.1, CUDA 11.7, GeForce RTX 4090

๐Ÿ“ฆ Repository

git clone https://github.com/prs-eth/Marigold.git
cd Marigold

๐Ÿ’ป Dependencies

python -m venv venv/marigold
source venv/marigold/bin/activate
pip install -r requirements.txt

๐Ÿš€ Inference on in-the-wild images

๐Ÿ“ท Sample images

bash script/download_sample_data.sh

๐ŸŽฎ Inference

This script will automatically download the checkpoint.

python run.py \
    --input_rgb_dir data/in-the-wild_example \
    --output_dir output/in-the-wild_example

โš™๏ธ Inference settings

  • The inference script by default will resize the input images and resize back to the original resolution.

    • --resize_to_max_res: The maximum edge length of resized input image. Default: 768.
    • --not_resize_input: If given, will not resize the input image.
    • --not_resize_output: If given, will not resize the output image back to the original resolution. Only valid without --not_resize_input option.
  • Trade-offs between accuracy and speed (for both options, larger value results in more accurate results at the cost of slower inference speed.)

    • --n_infer: Number of inference passes to be ensembled. Default: 10.
    • --denoise_steps: Number of diffusion denoising steps of each inference pass. Default: 10.
  • --seed: Random seed, can be set to ensure reproducibility. Default: None (using current time as random seed).

  • --depth_cmap: Colormap used to colorize the depth prediction. Default: Spectral.

  • The model cache directory can be controlled by environment variable HF_HOME, for example:

    export HF_HOME=$(pwd)/checkpoint

โฌ‡ Using local checkpoint

# Download checkpoint
bash script/download_weights.sh
python run.py \
    --checkpoint checkpoint/Marigold_v1_merged \
    --input_rgb_dir data/in-the-wild_example\
    --output_dir output/in-the-wild_example

๐ŸŽ“ Citation

@misc{ke2023repurposing,
      title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation}, 
      author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
      year={2023},
      eprint={2312.02145},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

License

About

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

https://marigoldmonodepth.github.io

License:Other


Languages

Language:Python 97.7%Language:Shell 2.3%