Demo video: https://drive.google.com/file/d/1_SYBj9hkbrSHbGtJosryvTB7PRyBfeEh/view?usp=sharing
PPO:
Proximal Policy Optimization (PPO) is presently considered state-of-the-art in Reinforcement Learning. The algorithm, introduced by OpenAI in 2017, seems to strike the right balance between performance and comprehension. It is empirically competitive with quality benchmarks, even vastly outperforming them on some tasks. At the same time, it is sufficiently simple for practical adoption by a wide range of users, which cannot be said for every RL algorithm.
Environment used:
Python: 3.8.10
gym: 0.26.2
stable_baselines3 : 2.1.0