Saved depth seems wrong

Question

Saved depth seems wrong

cnut1648 opened this issue a year ago · comments

Hello, thanks for the great work!

I am running your model on my custom dataset. However it seems that the saved depth from NYUv2 model is wrong. I think this might due to my misuse of your model's output. I have a script like this:

import os
import shutil
import torch
import numpy as np
import cv2
from tqdm import tqdm
from pathlib import Path
import sys, json
from PIL import Image
import torchvision.transforms.functional as TF
# I clone your repo and put to the place where I can directly import
sys.path.insert(0, str(Path(__file__).parent.resolve() / "idisc"))
from idisc.models.idisc import IDisc
from idisc.utils import (DICT_METRICS_DEPTH, DICT_METRICS_NORMALS,
                         RunningMetric, validate)
model = IDisc.build(json.load(open('idisc/configs/nyu/nyu_swinl.json')))
model.load_pretrained("idisc/nyu_swinlarge.pt")
model = model.to("cuda")
model.eval()


# read in image
image = np.asarray(Image.open(image_path))
image = TF.normalize(TF.to_tensor(image), **{"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]})
image = image.unsqueeze(0).to("cuda")

with torch.inference_mode():
    depth, *_ = model(image)

TF.to_pil_image(depth[0].cpu()).save(save_path)

I am using Swin-Large model. The image_path is the path to this image

of size 224x224 (I uploaded the exact image in case you might need to debug this), DPT can generate depth like this

however the output of idisc swin-large is this

I believe I made some mistakes somewhere. I wonder if you can help me debug this.

Thanks!

Luigi Piccinelli · Answer 1 · Tue Aug 08 2023 19:27:54 GMT+0800 (China Standard Time)

Thank you for using our model.
I believe that the effect comes from the saving part, the output depth of our model is metrics depth, thus is has floating values from [0.0, +inf), which can result in PIL saving the image in the wrong format.
I suggest you first convert the scalar float values to RGB with a colormap transformation and then save it. You can look into idisc/utils/visualization.py, more specifically into colorize function. It accepts 2D inputs (e.g., (H, W) shaped numpy array), min and max values (for NYU is 0.01 and 10.0 meters), and the colormap name. For instance, "magma" is a good colormap choice since it has a perceptually increasing colormap and does not introduce wrong spurious contrasts.

One little nitpick: the model was trained with ImageNet normalization statistics, hence it would be better to normalize the RGB image with those, instead of the default ones, i.e., {"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]}

Jiashu Xu · Answer 2 · Wed Aug 09 2023 02:05:40 GMT+0800 (China Standard Time)

Hi @lpiccinelli-eth, this solves it!
Thank you so much for your response and detailed explanation!