liuyuan-pal / Gen6D

[ECCV2022] Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to improve the results

JaouadROS opened this issue · comments

I've been playing with the Gen6D couple of times with first the mouse and then with custom objects but still can't figure out correctly how to get everything right at the first attempt. So I decided to use a relatively long reference video and the same video as query video but still don't get it right.

In this example, I have a reference video with 6885 frames. Only 689 were used with Colmap to get 3D model. It took 6 hours BTW. Then I estimated the meta info using these instructions and I've got those values:

0.984891 -0.219880 -0.156923
-0.0717203 0.377376 -0.923279

Here is my 3D model. I expected better results with 689 frames:
Screenshot from 2023-09-18 17-37-21

After that I run the predict on the same reference video and the detection fails:
0

1- So how to guarantee the detection at the first attempt? Here I use 689 images for reconstruction and 6840 for detection (reference images are a subset of query images) but the detection still fails. Having the same datasets (reference and query) and the detection still fails, does it mean that meta data aren't correct? I don't see any reason why it fails in that case. Other reason why it is strange to me is the fact that the scale difference between the reference and the query images is nearly the same.

But after couple of frames, the tracking works correctly. Here are the frames:
Frame 1/6885
1
Frame 624/6885
624
Frame 640/6885
640
Then it works till the end of the video.

2- When I compute the meta data using CC couple of times for the same model (without closing CC), I get slightly different results, this is normal behavior and does it affect the performance of Gen6D?

  1. Hi, the detection is indeed not very stable especially when there is a large scale difference. About the scale difference, you may refer to this #29 (comment). We will always resize the reference image so that there is still a large scale difference even using the original reference sequence as a query. The results suddenly get better because we will crop the image according to the current pose, the refiner happens to find a pose near to the object and thus the subsequent steps are correct. You may refer to intermediate results as stated here https://github.com/liuyuan-pal/Gen6D#qualitative-results.
  2. Slightly changing the metadata will not change the final results. It's a normal behavior of Gen6D.