About the tp1 in replay

Question

About the tp1 in replay

TeleeMa opened this issue a year ago · comments

Thank you for your great work. I have one question about the "tp1" keys in the replay_samples, like 'front_rgb_tp1', 'gripper_pose_tp1' etc. From the replay buffer implementation, I think the tp1 keys are used to store the next key_frame's observations and actions. So in the RVT training, the tp1 observations and actions are not used? That means we just need the current transition's observation and actions for extracting features and supervision.

Is my understanding correct? If there are any mistakes, please correct me.

Thank you very much.

Ankit Goyal · Answer 1 · Sun Nov 19 2023 09:18:11 GMT+0800 (China Standard Time)

Hi @TeleeMa,

Thanks for your kind words.

I am not sure if I undertand your question correctly.

tp1 (defined here: https://github.com/NVlabs/RVT/blob/master/rvt/utils/dataset.py#L242) stores the gripper location in the next key frame. It is used to extact the action for supervision (https://github.com/NVlabs/RVT/blob/master/rvt/utils/dataset.py#L169-L184).

Hope it helps.

Best,
Ankit

Teli Ma · Answer 2 · Wed Nov 22 2023 14:04:20 GMT+0800 (China Standard Time)

Hi @imankgoyal,

Thank you for your reply. I mean in the process of training, each batch from dataloader contains the keys of "tp1" as the image shows.

And the tp1 keys and values are added into the batch when sampling here https://github.com/NVlabs/YARR/blob/72c37e6d31ff1b1d7ce131be5c8f80f66ee271d8/yarr/replay_buffer/uniform_replay_buffer.py#L821-L832.
But they are not used in the model training. So what these for?

Ankit Goyal · Answer 3 · Thu Nov 23 2023 08:13:28 GMT+0800 (China Standard Time)

You can safely igore them. These are legacy attribute coming from PerAct -- Arm as our dataloader is exactly the same.