How about the result on real world datasets?

Question

How about the result on real world datasets?

endlesswho opened this issue 2 years ago · comments

@dazinovic How about the result on real-world datasets. I collect the datasets with my own RGBD camera and estimate the pose with colmap. But the results is a mass. Any advices?

Dejan · Answer 1 · Wed Feb 23 2022 18:18:19 GMT+0800 (China Standard Time)

I think you'll have to be a bit more specific about the problem. Do you have a problem with COLMAP or does my method give you bad results when you use it with the COLMAP poses? If it's the latter, make sure the cameras are in the correct coordinate system and that you are correctly loading the depth maps. You will have to either save the data in the described format or write your own dataset loader. Since you have depth, you can also try aligning the images with BundleFusion.

endlesswho · Answer 2 · Thu Feb 24 2022 20:30:29 GMT+0800 (China Standard Time)

e depth maps. You will have to either save the data in the described format or write your own dataset loader. Since you have depth, you can also try aligning the images

The poses are estimated by colmap. The depth image are normalized to 0-1 for training. I wondered whether I need to reconstruct my scene my rgbd reconstruction and get the sc_factor and translation correct for the network?

Dejan · Answer 3 · Fri Feb 25 2022 18:18:55 GMT+0800 (China Standard Time)

The depth images need to be in metric space.

Ran Cheng · Answer 4 · Sat Feb 26 2022 15:14:25 GMT+0800 (China Standard Time)

@endlesswho is the problem solved? could you please post your result here?

endlesswho · Answer 5 · Tue Mar 01 2022 13:08:16 GMT+0800 (China Standard Time)

@endlesswho is the problem solved? could you please post your result here?

I'm so sad the problem still remains. The depth images in metric space, but the output pose of colmap is a scaled value. I think a rgbd reconstruction method would work!

Dejan · Answer 6 · Tue Mar 01 2022 22:57:46 GMT+0800 (China Standard Time)

You can use some flavor of KinectFusion to to obtain camera poses. If you want to use the COLMAP poses with your depth sensor's measurements, you will need to scale the translation vectors of your camera poses.

endlesswho · Answer 7 · Thu Mar 03 2022 15:26:21 GMT+0800 (China Standard Time)

@rancheng My problem was solved with a rgbd reconstruction with icp matching and get the trajectory. However, the reconstruction results with @dazinovic 's method seems no so good. I also run the result with breakfast_room. With a disturb of trajectory, the result was shown bellow:

Dejan · Answer 8 · Thu Mar 03 2022 18:08:10 GMT+0800 (China Standard Time)

It looks like your camera extrinsics are in the wrong coordinate system. My method uses the OpenGL convention (same as NeRF). Maybe one of these issues can help you:
#4
#2

endlesswho · Answer 9 · Fri Mar 04 2022 17:38:57 GMT+0800 (China Standard Time)

convention

Reasonable! I'll have a try and pose my new results.

endlesswho · Answer 10 · Mon Mar 07 2022 11:31:35 GMT+0800 (China Standard Time)

It looks like your camera extrinsics are in the wrong coordinate system. My method uses the OpenGL convention (same as NeRF). Maybe one of these issues can help you: #4 #2

My camea extrinsics are in wrong coordinate system. I transform my coordinate system to OpenGL convention, the results are better. However, what if I don't know the bounding of the scene, any suggestions to solve this question?

Dejan · Answer 11 · Mon Mar 07 2022 18:30:33 GMT+0800 (China Standard Time)

You can approximate it with your camera positions.

endlesswho · Answer 12 · Wed Mar 09 2022 17:14:37 GMT+0800 (China Standard Time)

You can approximate it with your camera positions.

My result is all right with the help of your advices. Thanks for your kindly reply.

JyotiLuxolis · Answer 13 · Tue Jul 05 2022 05:18:15 GMT+0800 (China Standard Time)

Hello @endlesswho @dazinovic, can you briefly describe what changes you made to get it to work with real world datasets? Is it something as follows?:

Generate Poses using Colmap --> 2. Normalize Depth maps to 0-1 --> 3. Transform poses as described in #2 --> 3. Running the training procedure

Also a few more questions @endlesswho:

What dataset did you use?
What is meant by "keeping depth images in metric space"?

Dejan · Answer 14 · Wed Jul 06 2022 00:56:34 GMT+0800 (China Standard Time)

Hello @endlesswho @dazinovic, can you briefly describe what changes you made to get it to work with real world datasets? Is it something as follows?:
1. Generate Poses using Colmap --> 2. Normalize Depth maps to 0-1 --> 3. Transform poses as described in [How to Transform ScanNet Poses? #2](https://github.com/dazinovic/neural-rgbd-surface-reconstruction/issues/2) --> 3. Running the training procedure

I generated poses using BundleFusion (although, you can also use Colmap for this) and then applied the transformed as described in the linked issue. The depth maps are not normalized. The values need to be in meters. Scannet depth maps are in millimeters, so you simply need to divide by 1000. The method will work with other scales too, but the depth maps need to be consistent with the camera poses.

Junsheng Zhou · Answer 15 · Thu Jul 14 2022 10:45:55 GMT+0800 (China Standard Time)

Hello, I try to reproducing Neural-RGBD with the data used by manhattan SDF and NICE-SLAM (e.g. Replica).
I find that some views are optimized correctly (the rendered depths and images seems to be correct) while most of the views are optimized wrongly. I simply take the fx and fy in the intrinsic as the focal, and I don't know how to do with the cx and cy in the intrinsic. And it seems that extrinsics are similar with your provided data, and I also transform the depth to be in meters.

Do you have any ideas? Thanks! @dazinovic @endlesswho

Ro.Z · Answer 16 · Thu Aug 18 2022 03:32:54 GMT+0800 (China Standard Time)

I have encounter the same issue with @junshengzhou , could you reply to us? @dazinovic @endlesswho
Normally, fx, fy , cx, cy are provoded, but it seems that you only need a single value of focal length. How to deal with others? What is the focal length value I should use given the fx, fy, cx, cy?