attention cvrp operation-research ppo pytorch reinforcement-learning tsp

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

1️⃣ First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)

⚡ Speed up the training of Attention Model by 8 times (25hours $\to$ 3 hours)

🔎 A flexible framework for developing model, algorithm, environment, and search for operation research

News

13/04/2023: We release web demo on Hugging Face 🤗!
24/03/2023: We release our paper on arxiv!
20/03/2023: We release jupyter lab demo and pretrained checkpoints!
10/03/2023: We release our codebase!

Demo

We provide inference demo on colab notebook:

Environment	Search	Demo
TSP	Greedy
CVRP	Multi-Greedy

Installation

Conda

conda env create -n <env name> -f environment.yml
# The environment.yml was generated from
# conda env export --no-builds > environment.yml

It can take a few minutes.

Optional dependency

wandb

Refer to their quick start guide for installation.

File structures

All the major implementations were under rlor folder.

./rlor
├── envs
│   ├── tsp_data.py # load pre-generated data for evaluation
│   ├── tsp_vector_env.py # define the (vectorized) gym environment
│   ├── cvrp_data.py 
│   └── cvrp_vector_env.py 
├── models
│   ├── attention_model_wrapper.py # wrap refactored attention model to cleanRL
│   └── nets # contains refactored attention model
└── ppo_or.py # implementaion of ppo with attention model for operation research problems

The ppo_or.py was modified from cleanrl/ppo.py. To see what's changed, use diff:

# apt install diff
diff --color ppo.py ppo_or.py

Training OR model with PPO

TSP

python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp

CVRP

python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp

Enable WandB

python ppo_or.py ... --track

Add --track argument to enable tracking with WandB.

Where is the tsp data?

It can be generated from the official repo of the attention-learn-to-route paper. You may modify the ./envs/tsp_data.py to update the path to data accordingly.

Acknowledgements

The neural network model is refactored and developed from Attention, Learn to Solve Routing Problems!.

The idea of multiple trajectory training/ inference is from POMO: Policy Optimization with Multiple Optima for Reinforcement Learning.

The RL environments are defined with OpenAI Gym.

The PPO algorithm implementation is based on CleanRL.

About

Reinforcement learning for operation research problems with OpenAI Gym and CleanRL

https://cpwan.github.io/RLOR/

attention cvrp operation-research ppo pytorch reinforcement-learning tsp

Other

Languages

Language:Python 100.0%