vimalabs / VIMABench

Official Task Suite Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About output action's coordinate system frame

kenoharada opened this issue · comments

Thank you for sharing your work!

I have question about action space

Each pose consists of a 2D coordinate and a rotation represented as quaternion.
>>> {
    "pose0_position": Box(low=[0.25, -0.5], high=[0.75, 0.50], shape=(2,), dtype=np.float32),
    "pose0_rotation": Box(low=-1, high=1, shape=(4,), dtype=np.float32),
    "pose1_position": Box(low=[0.25, -0.5], high=[0.75, 0.50], shape=(2,), dtype=np.float32),
    "pose1_rotation": Box(low=-1, high=1, shape=(4,), dtype=np.float32),
}

What is coordinate system frame regarding to these position and rotation?

Are these relative to the robot base? or top camera frame?

Here is visualization of relationship between pose position and pixel position
スクリーンショット 2022-10-20 16 04 32

Hey @kenoharada, nice to see you again. The position is with respect to the workspace. The rotation denotes the rotation of the end effector when picking or placing. I can share a sample script to annotate actions on RGB frames from the top-down view.

    _, h, w = rgb.shape
    pos0, pos1 = action["pos0"], action["pos1"]
    # normalize to [0, 1] then scale to image size
    pos0 = (pos0 - self._pos_bound_low) / (
        self._pos_bound_high - self._pos_bound_low
    )
    pos1 = (pos1 - self._pos_bound_low) / (
        self._pos_bound_high - self._pos_bound_low
    )
    pos0 = pos0 * np.array([h, w])
    pos1 = pos1 * np.array([h, w])
    # annotate rgb
    rgb = rearrange(rgb.copy(), "c h w -> h w c")
    # RGB -> BGR
    rgb = cv2.cvtColor(rgb, cv2.COLOR_RGB2BGR)
    # draw circles
    rgb = cv2.circle(rgb, tuple(pos0.astype(np.int32)[::-1]), 5, (0, 0, 255), 2)
    rgb = cv2.circle(rgb, tuple(pos1.astype(np.int32)[::-1]), 5, (0, 255, 0), 2)
    # put text
    rgb = cv2.putText(
        rgb,
        " pick",
        org=tuple(pos0.astype(np.int32)[::-1]),
        fontScale=0.5,
        fontFace=cv2.FONT_HERSHEY_SIMPLEX,
        color=(0, 0, 255),
        thickness=1,
        lineType=cv2.LINE_AA,
    )
    rgb = cv2.putText(
        rgb,
        " place",
        org=tuple(pos1.astype(np.int32)[::-1]),
        fontScale=0.5,
        fontFace=cv2.FONT_HERSHEY_SIMPLEX,
        color=(0, 255, 0),
        thickness=1,
        lineType=cv2.LINE_AA,
    )
    # BGR -> RGB
    rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)

Thank you very much for the sample code! It really helps a lot. I really appreciate your generous and quick response!