It seems technically impossible almost to solve novel_adj_and_noun, twist, stack_order, sweep_without_exceeding
shure-dev opened this issue · comments
Thank you for your great work!
Could you provide us with an output code for the following task? because it seems technically impossible to solve the following task from VIMA with Instruct2Act. I want to check the actual output.
novel_adj_and_noun
how do you retrieve objects in the scene using CLIP for this task?? we have to match between [text and image] and [image]
How is this possible with your approach?
sweep_without_exceeding
This task requires to stop the sweep action in front of the object. this means we cannot use object location directly as an ending point. How do you describe the end point of sweep action?
How do you describe the rotation of the end effector?
manipulate_old_neighbor
For example, How do you get one object which is on the west side of one object? What kind of API is used for this point?
twist
How do you describe twist motion from some examples in the prompt? Does your code actually care about the twist example in the prompt?
stack_order
How do you generate code to extract information about stacking situations of objects?
I appreciate it if you could answer these questions! Thank you.