legged-robots locomotion python reinforcement-learning robotics wheeled-biped

PPO balancer

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the MPC balancer and PID balancer, it balances Upkie with straight legs. Training uses the UpkieGroundVelocity gym environment and the PPO implementation from Stable Baselines3.

An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.

Installation

conda create -f environment.yaml
conda activate ppo_balancer

Running a policy

On your machine

To run the default policy:

make test_policy

Here we assumed the spine is already up and running, for instance by running ./start_simulation.sh from upkie on your machine, or by starting a pi3hat spine on the robot.

To run a policy saved to a custom path, use for instance:

python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip

On a real robot

To build and upload your policy to the robot:

$ make build
$ make upload

Then, SSH into the robot and run the following target:

$ ssh your-upkie
user@your-upkie:~$ make run_ppo_balancer

This will run the policy saved at the default path. To run a custom policy, save its ZIP file to ppo_balancer/policy/params.zip (save its operative config as well) and follow the same steps.

Training a new policy

First, check that training progresses one rollout at a time:

make train_and_show

Once this works you can train for real, with more environments and no GUI:

make train

Check out the time/fps plots in the command line or in TensorBoard to adjust the number of parallel environments:

make tensorboard

You should increase the number of environments from the default value (NB_TRAINING_ENVS in the Makefile) to "as much as you can as long as FPS keeps going up".

Troubleshooting

Shared object file not found

Symptom: you are getting errors related to PyTorch not finding shared object files, with a call to _preload_cuda_deps() somewhere in the traceback:

  File ".../torch/__init__.py", line 178, in _load_global_deps
    _preload_cuda_deps()
  File ".../torch/__init__.py", line 158, in _preload_cuda_deps
    ctypes.CDLL(cublas_path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: .../nvidia/cublas/lib/libcublas.so.11: cannot open shared object file: No such file or directory

Workaround: pip install torch in your local pip environment. This will override Bazel's and allow you to train and run normally.

About

Train a balancing policy for Upkie by reinforcement learning

legged-robots locomotion python reinforcement-learning robotics wheeled-biped

Apache License 2.0

Languages

Language:Python 88.2%Language:Makefile 8.7%Language:Shell 3.2%