real-stanford / scalingup

[CoRL 2023] This repository contains data generation and training code for Scaling Up & Distilling Down

Home Page:https://www.cs.columbia.edu/~huy/scalingup/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

evaluation on UR10

yellow07200 opened this issue · comments

Hi @huy-ha ,

Thank you for sharing this amazing work!

I got several doubts and problems when trying to evaluate trained model using UR10:

  1. Can you please guide me that where you set the parameters of cameras? Sorry that I am a beginner of using Mujoco.
  2. In simulation, one camera is used to get observations of environment, and the other camera mounted is used to help with fine-grained manipulation. So can I ignore the mounted camera in real experiments?
  3. In your shared file scaling_up_eval_real_robot.py, can we directly apply on UR10 by simply changing parameters in RTDEInterpolationController?

Best regards,

Hi @yellow07200,

Thanks for the kind words 🐧

    1. The cameras belonging to the scene is here (in this case, front and top_down). I also attach real sense cameras while setting up the scene the the gripper camera mount point of the robot here. To make sure they gets used or enabled during data generation or training, you can set this list accordingly.
  1. Yes, you can train a policy using only the top or front camera (or whichever camera you define in the scene's .xml). Then, for instance, if you wanted to train a policy which used only the top camera, you can run the training command with obs_cameras=[top].
  2. I'm not personally sure, since we've never tested it. It's a good idea to run the RTDE hello world examples first, then you should replay the simulation action on the real robot to make sure they actually match. Finally, you can use the other scripts to slowly integrated the learned policy.

Hope this helps!

Hi @huy-ha

Thanks for your prompt response.

I want to further extend the questions asked by @yellow07200 and this thread might be a guide for evaluation on real robots.

  1. Where exactly in the code do we define the placement of the cameras? In reality, if I have 2 cameras, after doing the extrinsic calibration I will end up with the exact pose (transformation matrix) of the camera (actually camera sensor) with respect to the parent frame. So we need to know where is the info defined here being consumed in the code to replace it with our camera calibration

  2. Can you add comments on the parameters in RTDEInterpolationController. I personally think it might be just a matter of changing parameters to integrate it with UR10

Thanks!

Hey!

Where exactly in the code do we define the placement of the cameras?

If I understood your question correctly, the camera parameters are defined in the xml. The xml defines the world model, which gets turned into a MuJoCo data and model instance wrapped inside an dm_control.Physics object. Then, calling get_obs() eventually calls render(), which accesses the camera position and orientation defined in the xml file.

Note that in real, I never carefully calibrated the camera. Instead, I made the policy less sensitive to camera parameters by domain randomizing it. In my default domain randomization set up, I varying both the camera extrinsics and intrinsics.

Can you add comments on the parameters in RTDEInterpolationController

I would refer to RTDE docs to make sure RTDE settings are correctly set up, and run the hello world scripts from RTDE. For instance, one thing you would want to make sure to set correctly is the real-time control frequency of the arm. Diffusion Policy's codebase used 125Hz, which is compatible with UR5 CB3, meanwhile, my UR5e setup called for 500 Hz. Depending on whether you have a UR10 or UR10e, you should set the control rate accordingly.

Hi @huy-ha,

Thanks for your kind reply.

I caught in a cycle of the problem of how to mapping the object position of camera and the desired position of end effector. I revisit the diffusion policy in detail, I think the desired pose of end effector will be output in base coordinate, and the objects' poses obtained from camera and pose of robot will also be mapped and uniformed in based coordinate rather than in camera frame through the diffusion model. Please correct me if I misunderstand anything~ thanks!

And can you please also upload real_env_robo_summ.py?

Best regards,

Yep, the end effector commands are in the robot's frame! Also, here's the script.