A PyTorch implementation of Soft Actor-Critic[1,2] with n-step rewards and prioritized experience replay[3].
NOTE
I re-implemented Soft Actor-Critic in discor.pytorch repositry, which is better organized and faster, with DisCor algorithm. Please check it out!!
You can install liblaries using pip install -r requirements.txt
except mujoco_py
.
Note that you need a licence to install mujoco_py
. For installation, please follow instructions here.
You can train Soft Actor-Critic agent like this example here.
python code/main.py \
[--env_id str(default HalfCheetah-v2)] \
[--cuda (optional)] \
[--seed int(default 0)]
If you want to use n-step rewards and prioritized experience replay, set multi_step=5
and per=True
in configs.
Results of above example (without n-step rewards nor prioritized experience replay) will be like below, which are comparable (or better) with results of the paper.
[1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).
[2] Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).
[3] Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).