Learning to Drive Smoothly in Minutes

Learning to drive smoothly in minutes, using a reinforcement learning algorithm -- Soft Actor-Critic (SAC) -- and a Variational AutoEncoder (VAE) in the Donkey Car simulator.

Blog post on Medium: link

Video: https://www.youtube.com/watch?v=iiuKh0yDyKE

Level-0	Level-1

Download VAE	Download VAE
Download pretrained agent	Download pretrained agent

Note: the pretrained agents must be saved in logs/sac/ folder (you need to pass --exp-id 6 (index of the folder) to use the pretrained agent).

Quick Start

Download simulator here or build it from source
Install dependencies (cf requirements.txt)
(optional but recommended) Download pre-trained VAE: VAE Level 0 VAE Level 1
Train a control policy for 5000 steps using Soft Actor-Critic (SAC)

python train.py --algo sac -vae path-to-vae.pkl -n 5000

Enjoy trained agent for 2000 steps

python enjoy.py --algo sac -vae path-to-vae.pkl --exp-id 0 -n 2000

To train on a different level, you need to change LEVEL = 0 to LEVEL = 1 in config.py

Train the Variational AutoEncoder (VAE)

Collect images using the teleoperation mode:

python -m teleop.teleop_client --record-folder path-to-record/folder/

Train a VAE:

python -m vae.train --n-epochs 50 --verbose 0 --z-size 64 -f path-to-record/folder/

Train in Teleoparation Mode

python train.py --algo sac -vae logs/vae.pkl -n 5000 --teleop

Test in Teleoparation Mode

python -m teleop.teleop_client --algo sac -vae logs/vae.pkl --exp-id 0

Explore Latent Space

python -m vae.enjoy_latent -vae logs/level-0/vae-8.pkl

Reproducing Results

To reproduce the results shown in the video, you have to check different values in config.py.

Level 0

config.py:

MAX_STEERING_DIFF = 0.15 # 0.1 for very smooth control, but it requires more steps
MAX_THROTTLE = 0.6 # MAX_THROTTLE = 0.5 is fine, but we can go faster
MAX_CTE_ERROR = 2.0 # only used in normal mode, set it to 10.0 when using teleoperation mode
LEVEL = 0

Train in normal mode (smooth control), it takes ~5-10 minutes:

python train.py --algo sac -n 8000 -vae logs/vae-level-0-dim-32.pkl

Train in normal mode (very smooth control with MAX_STEERING_DIFF = 0.1), it takes ~20 minutes:

python train.py --algo sac -n 20000 -vae logs/vae-level-0-dim-32.pkl

Train in teleoperation mode (MAX_CTE_ERROR = 10.0), it takes ~5-10 minutes:

python train.py --algo sac -n 8000 -vae logs/vae-level-0-dim-32.pkl --teleop

Level 1

Note: only teleoperation mode is available for level 1

config.py:

MAX_STEERING_DIFF = 0.15
MAX_THROTTLE = 0.5 # MAX_THROTTLE = 0.6 can work but it's harder to train due to the sharpest turn
LEVEL = 1

Train in teleoperation mode, it takes ~10 minutes:

python train.py --algo sac -n 15000 -vae logs/vae-level-1-dim-64.pkl --teleop

Note: although the size of the VAE is different between level 0 and 1, this is not an important factor.

Record a Video of the on-board camera

You need a trained model. For instance, for recording 1000 steps with the last trained SAC agent:

python -m utils.record_video --algo sac --vae-path logs/level-0/vae-32-2.pkl -n 1000

Citing the Project

To cite this repository in publications:

@misc{drive-smoothly-in-minutes,
  author = {Raffin, Antonin and Sokolkov, Roma},
  title = {Learning to Drive Smoothly in Minutes},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/araffin/learning-to-drive-in-5-minutes/}},
}

Credits

Related Paper: "Learning to Drive in a Day".

r7vme Author of the original implementation
Wayve.ai for idea and inspiration.
Tawn Kramer for Donkey simulator and Donkey Gym.
Stable-Baselines for DDPG/SAC and PPO implementations.
RL Baselines Zoo for training/enjoy scripts.
S-RL Toolbox for the data loader
Racing robot for the teleoperation
World Models Experiments for VAE implementation.

airopti / learning-to-drive-in-5-minutes