Using Soft Actor-Critic for Low-Level UAV Control

This repository is the official implementation of Using Soft Actor-Critic for Low-Level UAV Control. This work will be presented in the IROS 2020 Workshop - "Perception, Learning, and Control for Autonomous Agile Vehicles".

We train a policy using Soft Actor-Critic to control a UAV. This agent is dropped in the air, with a sampled distance and inclination from the target (the green sphere in the [0,0,0] position), and has to get as close as possible to the target. In our experiments the target always has the position = [0,0,0] and angular velocity = [0,0,0].

Watch the video

Framework It is a traditional RL env that accesses the Pyrep plugin, which accesses Coppelia Simulator API. It is a lot faster than using the Remote API of Coppelia Simulator, and you also have access to a simpler API for manipulating/creating objects inside your running simulation.

Initial positions for the UAV agent

Requirements/Installing

Docker

One of the safest ways to emulate our environment is by using a Docker container. This approach is better to train in a cluster and have a stable environment, although forwarding the display server with Docker is always tricky (we leave this one to the reader).

Change the container's variables and then use the Makefile to make it easier to use our Docker Container. The commands are self-explanatory.

create-image

make create-image

create-container

make create-container

training

make training

evaluate-container

make evaluate-container

Without-Docker

Install Coppelia Coppelia Simulator
Install Pyrep Pyrep
Install Drone_RL Drone_RL 4)To install requirements:

pip install -r requirements.txt

To install this repo:

python setup.py install

Training

To train the model(s) in the paper, run this command:

./training.sh

Is somewhat tricky to train an exact policy, because that is a variability inherent to off-policy models and reward-shaping to achieve optimal control politics for Robotics.

One hack that alleviates this problem is save something like a moving-window of say 5-10 policies and pick the best one (qualitatively) after a particular reward stabilization. More research is needed to alleviate the need for qualitative assessment of the trained policies.

Evaluation

To evaluate my model with the optimal policy, run:

./evaluate.sh

Pre-trained Models

You can check the saved trained policies in:

saved_policies/

Results

Run the notebooks on notebooks/ to check the images presented on the paper.

results

Credits

Code heavily based in RL-Adventure-2

The environment is a continuation of the work in:

G.  Lopes,  M.  Ferreira,  A.  Sim ̃oes,  and  E.  Colombini,  “Intelligent Control of a Quadrotor with Proximal Policy Optimization,”Latin American Robotic Symposium, pp. 503–508, 11 2018

License

MIT-LICENSE

Cite us

Barros, Gabriel M.; Colombini, Esther L, "Using Soft Actor-Critic for Low-Level UAV Control", IROS - workshop Perception, Learning, and Control for Autonomous Agile Vehicles, 2020.

@misc{barros2020using, title={Using Soft Actor-Critic for Low-Level UAV Control}, author={Gabriel Moraes Barros and Esther Luna Colombini}, year={2020}, eprint={2010.02293}, archivePrefix={arXiv}, primaryClass={cs.RO}, journal={IROS - workshop "Perception, Learning, and Control for Autonomous Agile Vehicles"}, }

joseadp / SAC_uav