🐛 Large tracking error with PPO learned policy

Question

🐛 Large tracking error with PPO learned policy

jc-bao opened this issue 10 months ago · comments

Chaoyi Pan commented 10 months ago

Performance

30 centimeter tracking error is relatively large.

meshcat_1694284071493.tar.mp4

Next step

Slow down the trajectory and try again.
Implement MPPI/MPC to track the trajectory.

Chaoyi Pan · Answer 1 · Sun Sep 10 2023 02:36:52 GMT+0800 (China Standard Time)

Slow down the trajectory

initial value:
A1=0.8 w1=1.5 a1_max = 1.8m/s^2
A2=0.8 w2=3.0 a1_max = 7.2m/s^2

now:
a1_max = 0.45
a2_max.= 1.8

Result

meshcat_1694285145189.tar.mp4

After training more steps:

meshcat_1694285495059.tar.mp4

Conclusion

the tracking error is still relatively large. (10cm)
Need to check dynamics issues #12

Chaoyi Pan · Answer 2 · Sun Sep 10 2023 03:52:46 GMT+0800 (China Standard Time)

Other's PPO performance

This is the result reported from APG paper:

Makes PPO performance degradation accountable.

Chaoyi Pan · Answer 3 · Sun Sep 10 2023 04:24:47 GMT+0800 (China Standard Time)

Simple reward engineering

    reward = 0.9 - \
        0.05 * err_vel - \
        err_pos * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
        jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
        jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1

meshcat_1694298987491.tar.mp4

Chaoyi Pan · Answer 4 · Wed Sep 13 2023 20:43:05 GMT+0800 (China Standard Time)

Conclusion

problem resolved by reward engineering.