This is a clean and robust Pytorch implementation of Soft-Actor-Critic on discrete action space. Here is the training curve:
All the experiments are trained with same hyperparameters. Other RL algorithms by Pytorch can be foundA quick render here:
gym==0.19.0
numpy==1.21.2
pytorch==1.8.1
tensorboard==2.5.0
run 'python main.py', where the default enviroment is CartPole-v1.
run 'python main.py --write False --render True --Loadmodel True --ModelIdex 50'
If you want to train on different enviroments, just run 'python main.py --EnvIdex 1'.
The --EnvIdex can be set to be 0 and 1, where
'--EnvIdex 0' for 'CartPole-v1'
'--EnvIdex 1' for 'LunarLander-v2'
You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'
For more details of Hyperparameter Setting, please check 'main.py'
Christodoulou P. Soft actor-critic for discrete action settings[J]. arXiv preprint arXiv:1910.07207, 2019.
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.
Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv:1812.05905, 2018.