Stewart Platform Learning

Set of tools and environments to implement Deep Reinforcement Learning (DRL) algorithms on Stewart Platfrom by parametric simulation in Gazebo and ROS.

Cloning the project as a workspace

git clone https://github.com/HadiYd/stewart_platform_learning.git

Build the controller plugin for controlling joints and changing the PID values.

Plugin credit by: ros_sdf with modification of adding PID section to the code.

cd src/stewart_platform/plugin
mkdir build
cd build
cmake ../
make

Installing openai_ros and building the project

cd stewart_platform_learning/src
git clone https://bitbucket.org/theconstructcore/openai_ros.git
cd stewart_platform_learning
catkin build
source devel/setup.bash
rosdep install openai_ros

Spawn the Stewart platform in Gazebo using launch file

roslaunch stewart_platform stewart.launch

In case of an error in the subsequent launches, kill the previous running Gazebo server by:

killall -9 gzserver

Running the deep reinforcement learning training scripts

I use wandb to log all the rewards and performance metrics. First pip install it, then create a free account to use it.

pip install wandb

wandb login

DRL algorithms credit by: Deep Reinforcement Learning in TensorFlow2

DDPG

Paper Continuous control with deep reinforcement learning
Author Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

Running the DDPG algorithm:

rosrun stewart_platform DDPG_Continuous.py

A3C

Paper Asynchronous Methods for Deep Reinforcement Learning
Author Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Running the A3C algorithm:

rosrun stewart_platform A3_algorithm_training.py

PPO

Paper Proximal Policy Optimization
Author John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Running the PPO algorithm:

rosrun stewart_platform PPO_Continuous.py

yapbenzet / stewart_platform_learning