hustvl / HAIS

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Discuss on consistency] Custom data on Realsense L515+ORB-SLAM3+Open3D Reconstruction

glennliu opened this issue · comments

Hi Shaoyun

Thanks for your work. I'm interested in how HAIS can help the SLAM system extract persistent semantic landmarks.

I have tested HAIS on my own dataset,

  • Hardware: Intel Realsense L515 RGB-D camera
  • Localization: ORB-SLAM3 with purely RGB-D input
  • Reconstruction: Open3D RGB-D integration, which is based on TSDF and extract point cloud after the entire scan is finished.

Here are some results (colored by raw RGB and semantic segmentation),
a. Living Room
living_mesh

living

b. dinning room
dinning_mesh

dinning

c. study room
studyroom_mesh

studyroom

The quantitative results show there is quite many over-segmentation. In the ScanNet test results, I can also see a few of the over-segmentation problems. But it is way less frequent than in my customized dataset. Moreover, over-segmentation in ScanNet dataset normally occurs in those poorer reconstructed sub-volumes, while it occurs in well-reconstructed spaces in the customized data, such as the Living Room scan.

So, what is actually affecting the segmentation performance in the self-collected dataset?

  • Previous issues #19 also discuss the issues. And some says the class_numpoint_mean_dict should be modified. But I'm running in indoor scenes similar to ScanNet. mean_numpoint and mean_radius should be only slightly different. That dictionary parameter should not affect the consistency across datasets in my case.
  • How is the domain shift of HAIS? What kind of issues can affects the consistency across different dataset?

Thanks
Chuhao

I also runs it on several SceneNN dataset.
bedroomnn

bedroomnn_semantic
The scene is also constructed by Open3D integration, but using the ground-truth pose.

I can also see some over-segmentation, such as the broken chair and table.
I understand this is a long-term challenge in scene segmentation tasks and cannot be perfectly solved. But I just want to discuss what affects it and how to improve it across different dataset.

Thanks

Thanks for the interesting question. Over-seg. and under-seg. are related to the instance seg. but not semantic seg. Your visualization reflects that class labels of some points are wrongly predicted, thus the prolem lies in the first stage. HAIS is trained on ScanNetV2 whose data amount is far away from enough. I think the data amount accounts for the wrong prediction and maybe adopts more data can alleviate the problem. And there do exist gaps between datasets, e.g., point cloud density. Some parameters may bu tuned for better results in other datasets.
Hope these help you.

@outsidercsy Thanks for your time.
I agree with your points that data amount is one of the reasons. However, the datasets are already close to ScanNet data.
Those scene I choose are very similiar to some scenes in ScanNet (indoor living room etc.). And ScanNet is reconstructed using TSDF mapping, which is the same as what I used and SceneNN used. The RGB-D cameras we used are different but should not be too much difference.Besides, HAIS is trained with 1200 scans from ScanNet which is not a small amount. So, I was expecting very similar performance as ScanNet.
Overall, I think the data amount does matter. But there should be a more general method to improve the consistency of the segmentation network.

I mean that the data amount of ScanNet is also not enough for convergence. And I also observe many cases with poor segmentation results on ScanNet. What if pretraining on ScanNet and then finetuning on your dataset?

Some subvolumes in ScanNet are also reconstructed poorly. It should be reasonable that those poor volumes in ScanNet cannot be well segmented. I'd like to try finetuning my dataset. Do you have any suggestion on which parts of the parameter should be considered in finetuning?

Here are more results (raw rgb and instance segmentation+semantic text)
living_room
livingb_mesh
livingb

washing_room
washing_mesh
washing

bedroom
bedroom_mesh
bedroom

Beyond over-segmentation, there are a few falsely predicted semantic labels.

Thanks for the visualizaiton. As for data amount, ScanNet200 is released with much more annotated scenes and instances. More training data helps to solve the bad cases. https://rozdavid.github.io/scannet200