SysCV / idisc

iDisc: Internal Discretization for Monocular Depth Estimation [CVPR 2023]

Home Page:https://arxiv.org/abs/2304.06334

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Saved depth seems wrong

cnut1648 opened this issue · comments

Hello, thanks for the great work!

I am running your model on my custom dataset. However it seems that the saved depth from NYUv2 model is wrong. I think this might due to my misuse of your model's output. I have a script like this:

import os
import shutil
import torch
import numpy as np
import cv2
from tqdm import tqdm
from pathlib import Path
import sys, json
from PIL import Image
import torchvision.transforms.functional as TF
# I clone your repo and put to the place where I can directly import
sys.path.insert(0, str(Path(__file__).parent.resolve() / "idisc"))
from idisc.models.idisc import IDisc
from idisc.utils import (DICT_METRICS_DEPTH, DICT_METRICS_NORMALS,
                         RunningMetric, validate)
model = IDisc.build(json.load(open('idisc/configs/nyu/nyu_swinl.json')))
model.load_pretrained("idisc/nyu_swinlarge.pt")
model = model.to("cuda")
model.eval()


# read in image
image = np.asarray(Image.open(image_path))
image = TF.normalize(TF.to_tensor(image), **{"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]})
image = image.unsqueeze(0).to("cuda")

with torch.inference_mode():
    depth, *_ = model(image)

TF.to_pil_image(depth[0].cpu()).save(save_path)

I am using Swin-Large model. The image_path is the path to this image
00001
of size 224x224 (I uploaded the exact image in case you might need to debug this), DPT can generate depth like this
image
however the output of idisc swin-large is this
image
I believe I made some mistakes somewhere. I wonder if you can help me debug this.

Thanks!

Thank you for using our model.
I believe that the effect comes from the saving part, the output depth of our model is metrics depth, thus is has floating values from [0.0, +inf), which can result in PIL saving the image in the wrong format.
I suggest you first convert the scalar float values to RGB with a colormap transformation and then save it. You can look into idisc/utils/visualization.py, more specifically into colorize function. It accepts 2D inputs (e.g., (H, W) shaped numpy array), min and max values (for NYU is 0.01 and 10.0 meters), and the colormap name. For instance, "magma" is a good colormap choice since it has a perceptually increasing colormap and does not introduce wrong spurious contrasts.

One little nitpick: the model was trained with ImageNet normalization statistics, hence it would be better to normalize the RGB image with those, instead of the default ones, i.e., {"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]}

Hi @lpiccinelli-eth, this solves it!
Thank you so much for your response and detailed explanation!