NVlabs / RVT

Official Code for RVT-2 and RVT

Home Page:https://robotic-view-transformer-2.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

paper question

LemonWade opened this issue · comments

commented

Thank you very much for the work you've done. May I ask a question? Can I interpret your work as being based on multi-view? I'm curious about the primary difference between your approach and multi-view. If someone were to emulate your work using just multi-view for their experiments, would they outperform you?

Sorry for the disturbance and thank you in advance.

Hi, Thanks for your interest in our work.

I am not sure if I understand the question correctly. Would you please explain what do mean by “Can I interpret your work as being based on multi-view? I'm curious about the primary difference between your approach and multi-view.” Specifically, what do you mean by multi-view.

By multi-view, do you mean a network with direct multi-view image input and no re-rendering? If so, there are two sets of disadvantages to this:

  • Maintaining a setup with five cameras positioned at different angles (shown in Fig 3.) is hard. On the other hand, our current system can work with even with one RGBD sensor (as done in our real world experiments) and hence easier to use.

  • Directly using multi-view images without any re-rendering would prevent us from using orthographic projection, 3D augmentation and point correspondence. All these significantly boost performance (Table 2 Left) but require re-rendering.

Let me know if I understood your question correctly and if this helps.

commented

This is exactly the answer I was looking for. Thank you!

Thank you for your excellent work!
I would like to inquire about how the occlusion problem is addressed when there is only one RGBD sensor in the system.
Thanks!

Hi @FinnJob,

Thanks for the kind words. We didn't face any occlusion issues on the tasks we tested on. I suppose with more cluttered scenes this could be an issue, but the framework is flexible enough to allow adding an additional camera if need be.

Hi @FinnJob,

Thanks for the kind words. We didn't face any occlusion issues on the tasks we tested on. I suppose with more cluttered scenes this could be an issue, but the framework is flexible enough to allow adding an additional camera if need be.

Thanks, it helps a lot!