LeCAR-Lab / CoVO-MPC

Official implementation for the paper "CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design" accepted by L4DC 2024. CoVO-MPC is an optimal sampling-based MPC algorithm.

Home Page:https://lecar-lab.github.io/CoVO-MPC/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ› Large tracking error with PPO learned policy

jc-bao opened this issue Β· comments

Performance

30 centimeter tracking error is relatively large.

Copy of ppo

meshcat_1694284071493.tar.mp4

Next step

  • Slow down the trajectory and try again.
  • Implement MPPI/MPC to track the trajectory.

Slow down the trajectory

initial value:
A1=0.8 w1=1.5 a1_max = 1.8m/s^2
A2=0.8 w2=3.0 a1_max = 7.2m/s^2

now:
a1_max = 0.45
a2_max.= 1.8

Result
ppo

meshcat_1694285145189.tar.mp4

After training more steps:

meshcat_1694285495059.tar.mp4

Copy of Copy of ppo

Conclusion

  • the tracking error is still relatively large. (10cm)
  • Need to check dynamics issues #12

Other's PPO performance

This is the result reported from APG paper:

image

Makes PPO performance degradation accountable.

Simple reward engineering

image

    reward = 0.9 - \
        0.05 * err_vel - \
        err_pos * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
        jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
        jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1

plot
ppo

meshcat_1694298987491.tar.mp4

Conclusion

  • problem resolved by reward engineering.