NVlabs / RVT

Official Code for RVT-2 and RVT

Home Page:https://robotic-view-transformer-2.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Details about the real-world experiments

Nimolty opened this issue · comments

Hello! I am trying to reproduce the real-world results. I have one question: it is reported in the paper that only a thrid-person view camera is mounted. I was wondering where the camera is located approximately. Is it located just in the front view as in the simulations. Thank you very much ~

Hello,

Thanks for your interest in our work. The real camera is at a front-top ish location. It would help to make sure all the key-points are within the camera frame and that the objects of interest are visible.

Hope this helps!

Best,
Ankit

Thank you for your sincere answer.

I am trying to reproduce the "stack blocks" in real-world. I appreciate it very much if you could give an answer for the following questions.

  1. During the data collection, how to put the red / yellow / blue blocks. Are they set totally randomly or in an area. What about the order between them? What is the precise proportion of three variations in 14 data ?

  2. During the testing, how to put the red / yellow / blue blocks. Are they set approximate to the counterparts in the training set or randomly?

  3. After observing the real-world demo in the website and "stack blocks" task in simulations. I guess for each real demo, there is approximately 5 keyframes. (1) The end-effector pauses above the block. (2) The end-effector grasps the block (3) The end-effector pauses at the same location with (1) (4) The end-effector pauses above another block (5) The end-effector releases the block. I was wondering where my guessing split is correct in real-world and in (1) and (4), is there any special distance or relative pose between the paused location and the block, or the relative distance is set randomly for each demo.

Thank you very much ~

Hi,

  1. Yes, the blocks are randomly placed on the "robot table," the black table area visible in the videos. There are three variations in the data, with 5 (red on yellow), 5 (blue on red), and 4 (yellow on blue) examples.

  2. We test 10 examples for the three variations. We use 4 (red on yellow), 3 (blue on red), and 3 (yellow on blue) examples.

  3. Yes, there are five key-frames, as you defined. There are no strict conditions for (1) and (4). We use human demonstrations, so (1) and (2) might be close to one another, but there is no strict enforcement.

I hope this helps!

Best,
Ankit

Closing because of inactivity.