izzetemre/MiDaS

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

This repository contains code to compute depth from a single image. It accompanies our paper:

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun

MiDaS v2.1 was trained on 10 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS) with multi-objective optimization. The original model that was trained on 5 datasets (MIX 5 in the paper) can be found here.

Changelog

[Nov 2020] Released MiDaS v2.1:
- New model that was trained on 10 datasets and is on average about 10% more accurate than MiDaS v2.0
- New light-weight model that achieves real-time performance on mobile platforms.
- Sample applications for iOS and Android
- ROS package for easy deployment on robots
[Jul 2020] Added TensorFlow and ONNX code. Added online demo.
[Dec 2019] Released new version of MiDaS - the new model is significantly more accurate and robust
[Jul 2019] Initial release of MiDaS (Link)

Online demo

An online demo of the model is available: http://35.202.76.57/

Please be patient. Inference might take up to 30 seconds due to hardware restrictions.

Setup

Download the model weights model-f6b98070.pt and model-small-70d6b9c8.pt and place the file in the root folder.
Set up dependencies:
```
conda install pytorch torchvision opencv
```
The code was tested with Python 3.7, PyTorch 1.7.0, and OpenCV 4.4.0.

Usage

Place one or more input images in the folder input.

Run the model:

python run.py

Or run the small model:

python run.py --model_weights model-small-70d6b9c8.pt --model_type small

The resulting inverse depth maps are written to the output folder.

via Docker

Make sure you have installed Docker and the NVIDIA Docker runtime.
Build the Docker image:
```
docker build -t midas .
```
Run inference:
```
docker run --rm --gpus all -v $PWD/input:/opt/MiDaS/input -v $PWD/output:/opt/MiDaS/output midas
```
This command passes through all of your NVIDIA GPUs to the container, mounts the input and output directories and then runs the inference.

via PyTorch Hub

The pretrained model is also available on PyTorch Hub

via TensorFlow or ONNX

See README in the tf subdirectory.

via Mobile (iOS / Android)

See README in the mobile subdirectory.

via ROS1 (Robot Operating System)

See README in the ros subdirectory.

Accuracy

Zero-shot error (the lower - the better) and speed (FPS):

Model	DIW, WHDR	Eth3d, AbsRel	Sintel, AbsRel	Kitti, δ>1.25	NyuDepthV2, δ>1.25	TUM, δ>1.25	Speed, FPS
Small models:							iPhone 11
MiDaS v2 small	0.1248	0.1550	0.3300	21.81	15.73	17.00	0.6
MiDaS v2.1 small URL	0.1344	0.1344	0.3370	29.27	13.43	14.53	30
Relative improvement	-7.7%	+13.3%	-2.1%	-34.2%	+14.6%	+14.5%	50x

Big models:							GPU RTX 2080Ti
MiDaS v2 large URL	0.1246	0.1290	0.3270	23.90	9.55	14.29	59
MiDaS v2.1 large URL	0.1295	0.1155	0.3285	16.08	8.71	12.51	59
Relative improvement	-3.9%	+10.5%	-0.52%	+32.7%	+8.8%	+12.5%	1x

Citation

Please cite our paper if you use this code or any of the models:

@article{Ranftl2020,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
	year      = {2020},
}

License

MIT License

About

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2020"

MIT License

Languages

Language:Java 44.1%Language:Python 24.8%Language:Swift 22.7%Language:C++ 3.8%Language:CMake 2.6%Language:Shell 1.4%Language:Dockerfile 0.3%Language:Ruby 0.2%