lagadic / visp

Open Source Visual Servoing Platform

Home Page:https://visp.inria.fr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Megapose tracking tutorial not showing 3D cube nor progress bar

VladStojBlank opened this issue · comments

Hi all,

I got the VISP code built from source and all of the examples running. However, when I try and run the Megapose tracking tutorial, all it shows is the video and then when I input the top left and bottom right points, nothing happens. I am running both the megapose server and the example from the same Ubuntu machine (so server is running on 127.0.0.1 --port 5555 and the client runs and connects to this). I get no order specific errors.

So to be clear, the megapose tutorial runs, but there is no 3D cube (nor green progress bar) visible.

I also ran the client earlier from a Docker environment that connected to the Ubuntu machine, which worked, but now when I want to run everything locally on one single Ubuntu machine I get this issue.

I built everything from the latest git repo.

Any help or hints are appreciated!

Kind Regards,

Vlad

commented

Hi Vlad,

Thank you for opening an issue.

When the tracking fails, do you have a message in the top left corner of the window, asking you to perform detection?

I suspect that this is linked to a "performance" issue:

  • When you "detect" the bounding box by click, it is sent to Megapose to perform the initial pose estimation step: this step is slow. On a good GPU, it takes normally around two seconds. However, when you've just started the server, Pytorch performs a "warmup", which can heavily slow down the first inference pass.
  • While the pose estimation is being performed, the video continues to play.
  • If the cube moves between the detection and the answer from megapose, then you would only see a pose estimation for a short time (if megapose was confident in its estimation)

It is also possible that megapose was not able to estimate the pose with the given detection. However, I found in my experiments (with the same object and camera) that it was fairly robust so this seems unlikely.

You could try a couple of things:

  • First start the server, and run the tutorial a first time: even if it does not work, we ensure that the megapose models are "warmed up". Do not kill the server
  • Then rerun the tutorial a second time: try to acquire the bounding box as soon as the cube is static to ensure the most reliable result.

If the issue persists (i.e. the answer from megapose is too slow) you can also try to play with the megapose/initialisationNumSamples parameter on the client side.

I could also send you a YoloV7 trained to detect the cube, which you could plug easily plug into the tutorial (no code required only modifying the configuration file). However, I am away for some time, so this would have to wait.

Cheers,
Sam

Hi Sam,

Thank you for the detailed reply!

I can confirm that Megapose does indeed work - however not always. Using the example tutorial, sometimes after setting the initial points for the bounding box the 3D cube appears, but quite often it does not. I also tried with my own example (with custom 3D model, and specific camera intrinsic matrix parameters), but again nothing is computed.

There seems to be an issue with the amount of time it takes to compute a pose and the time it takes for the client to receive the data and render the result. I tried running the example both using a remote client and a server on our workstation, as well as as running the client and the server at the same time on the workstation. Our workstation has 2x A6000 GPUs , and Megapose by itself runs fairly well, so I don't know why there is such a performance issue.

As for the initialisationNumSamples, can you recommend some values? And is there a way to better sync the computation of the results between the client and the server?

Many thanks,

Vlad

Hi again,

I solved the problem by slowing down the frame rate of the video, so that the megapose server has time to compute and send the result.

However, my next problem is that when I try running the code with my own video, 3D model and intrinsic camera matrix parameters, the score value dips down to an average of 0.01...which is far below the threshold value needed for succesfull pose tracking.

Any pointers about how to use a custom video source and 3D model successfully with this tutorial?

Many thanks,

Vlad

commented

Hi!

I solved the problem by slowing down the frame rate of the video, so that the megapose server has time to compute and send the result.

So it is indeed a performance issue.
When you start the server, what is the value of CUDA_VISIBLE_DEVICES ? If you're using multiple GPUs, you could try to restrict Pytorch to use only one. Since I haven't tested with multiple GPUs, it is possible that this causes issues or delays.

Additionally, you can uncomment the line

# print(f'Inference took {int((time.time() - t) * 1000.0)}ms')
and reinstall the megapose server with pip install . so that the server prints the actual time it took to run megapose for each frame. This does not take into network communication, but since you are running on localhost, this should be negligible.

You may also wish to try the --optimize flag for the server. For now, this simply uses the torch JIT to compile the model. On my setup, the change is really small, but it might have an impact for you.

Any pointers about how to use a custom video source and 3D model successfully with this tutorial?

  • First you should export your model to obj, making sure that the model's dimensions are expressed in meters, that its origin (the 0,0,0 point in blender) is near the center of the object. For more info, check out https://visp-doc.inria.fr/doxygen/visp-daily/tutorial-megapose-model.html

  • Place the model in the data/models folder of the tutorial. You should follow a structure similar to that of the cube: create a subdirectory myObject containing the .obj and .mtl files (along with the .png)

  • If the megapose server is running, restart it

  • When launching the tutorial, add the argument object myObject so that megapose knows what object it should estimate the pose for. If you want to use a DNN for detection, you should also change the labels and the detectionMethod in the data/megapose_cube.json file.

  • To verify that your model is correct, you can also decrease the confidence (via the reinitThreshold) threshold and view the pose estimation and your model. If your model has no texture or color, you should review the export to .obj step.

  • To use your own video, replace the value of video-device with the path to your video. You can also use a live feed if you have a camera that is connected to your computer, by supplying an integer value to the video-device parameter. (see https://docs.opencv.org/3.4/d8/dfe/classcv_1_1VideoCapture.html#a5d5f5dacb77bbebdcbfb341e3d4355c1)

  • Since you are using your own camera, you should modify the camera intrinsic values, located in the data/megapose_cube.json file, in the camera field. Megapose only supports perspective projection without distortion.

As for the initialisationNumSamples, can you recommend some values? And is there a way to better sync the computation of the results between the client and the server?

By default, Megapose supports the values 72, 512, 576 and 4608, 576 being the default. I've had moderate success using 72 samples for initialisation on the cube with a detection network. Initialisation becomes much faster (400ms instead of ~2000ms) but may fail. Combining it with a detection network means that if initialisation fails, it can be retried asap. For values other those above, we generate random orientations, as done in https://github.com/thodan/bop_toolkit/blob/master/bop_toolkit_lib/transform.py
Concerning the sync/delay issue, the ideal solution would be to combine Megapose with a faster tracker (like MBT, already present in ViSP) or a filtering solution, that can be used to predict intermediate steps. We do have plans to release a tutorial combining MBT and Megapose, so stay tuned ;)
However, for motions that are not too fast, Megapose should be sufficient. In your case, the underlying latency issue (especially with your setup) is something that I need to investigate.

If you are willing, you could also send me your 3D model so that I can have a look at it and see if something is wrong.

Sam

Hi Sam,

Thank you for all the recommendations. Just to summarize the status and answer some of your questions:

  1. We have 2 GPU's of A6000's on our workstation, running Ubuntu 20x. Currently PyTorch uses the default GPU 0 for this (both GPU's are like 64gb I think, so using either is fine).

I tried with the --optimize flag, but not much difference was made, and also I tried with rgb-multiple-hypothesis, and this also did not produce any good results.

The camera intrinsic values match those of the correct values I used when testing the static version of the model and the image using default Python implementation of Megapose.

Basically what happens when I try and load in my model with our own video is that the computation gives an incredibly low score ie. reinitThreshold (like 0.001), and obviously this then does not estimate the pose at all. What is more interesting is that if I lower the threshold to match this low score, the 3D model still does not show up.

I made sure to use the correct exporting and alignment of the 3D model we are using (it matches the scale and position of the default cube model included in the example).

What is even more interesting is that when I tried to use my 3D model instead of the 3D cube in the default example, the reinitThreshold value also sinks to around 0.001. So it may be the case with the model maybe being too complex? I will find out if I can send the model and video to you privately for you to verify.

I also played around with the different initialisationNumSamples, but this did not have much effect.

I will try to further work on solving this problem today, and if I get any significant improvements, I will let you know.

Kind Regards,

Vlad