arthur-x / SimplySAC

SimplySAC replicates Soft-Actor-Critic with minimum (~200) lines of code in clean, readable PyTorch style, while trying to use as few additional tricks and hyper-parameters as possible (MuJoCo and PyBullet benchmarks included).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SimplySAC: A Minimal Soft-Actor-Critic PyTorch Implementation

SimplySAC replicates SAC with minimum (~200) lines of code in clean, readable PyTorch style, while trying to use as few additional tricks and hyper-parameters as possible.

Implementation details:

  • The actor's log_std output is clamped to lie within [-20, 2] according to the authors' source code.
  • Before learning, the replay buffer is warmed up with 1e4 transitions collected using an uniformly random policy.
  • The Q-value in the actor's loss averages from two critics.

That's it! All other things follow the original paper and pseudo code.

Also check out SimplyPPO, a minimal Proximal-Policy-Optimization PyTorch implementation.

MuJoCo benchmarks:

hopper

walker

cheetah

ant

humanoid

Same as the figures in the original paper, these figures are produced with:

  • One evaluation episode every 1e3 steps.
  • 5 random seeds, where the mean return is represented by the solid line, and max/min return by the shaded area.

To execute a single run:

python learn.py -g [gpu_id] -e [env_id] -l [log_id]

PyBullet benchmarks:

hopper_b

walker_b

cheetah_b

ant_b

humanoid_b

Note that the PyBullet versions of the locomotion environments are harder than the MuJoCo versions.

Experiments use torch==1.7.1, mujoco-py==1.50.1.68, pybullet==3.0.8.

About

SimplySAC replicates Soft-Actor-Critic with minimum (~200) lines of code in clean, readable PyTorch style, while trying to use as few additional tricks and hyper-parameters as possible (MuJoCo and PyBullet benchmarks included).

License:MIT License


Languages

Language:Python 100.0%