SamsungLabs / imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Train & eval inputs for different benchmarks

DianCh opened this issue · comments

Hi! Thank you for releasing this wonderful work! I am wondering how the inputs look like for different benchmarks, like how many images are used to predict the bounding boxes during training and evaluation? Is it stereo pair for KITTI and multi-view for SUN RGB-D/ScanNet (if so, how are the multi-view inputs selected)?

Hi @DianCh ,
We use a single image for KITTI and SUN RGB-D, 6 images for NuScenes and 50 images for ScanNet.

Thank you @filaPro for the reply! Just trying to understand the dataset protocol:

For SUN RGB-D, is it that only visible 3D gt boxes are used for supervision?
For ScanNet, is it that 20/50 images are randomly sampled per scene, and all 3D gt boxes in that scene are used for supervision/evaluation?

For SUN RGB-D all boxes are visible, a a scene is represented by a single rgb-d image. For ScanNet you are right, all boxes are used for supervision and evaluation.