OpenGVLab / Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

It seems technically impossible almost to solve novel_adj_and_noun, twist, stack_order, sweep_without_exceeding

shure-dev opened this issue · comments

Thank you for your great work!

Could you provide us with an output code for the following task? because it seems technically impossible to solve the following task from VIMA with Instruct2Act. I want to check the actual output.

novel_adj_and_noun

how do you retrieve objects in the scene using CLIP for this task?? we have to match between [text and image] and [image]
How is this possible with your approach?

sweep_without_exceeding

This task requires to stop the sweep action in front of the object. this means we cannot use object location directly as an ending point. How do you describe the end point of sweep action?
How do you describe the rotation of the end effector?

manipulate_old_neighbor

For example, How do you get one object which is on the west side of one object? What kind of API is used for this point?

twist

How do you describe twist motion from some examples in the prompt? Does your code actually care about the twist example in the prompt?

stack_order

How do you generate code to extract information about stacking situations of objects?

I appreciate it if you could answer these questions! Thank you.