ChenYiXuan96/PCBRoute

Solving combinatorial reinforcement learning problems using RL

This is the code for my dissertation in University of Bath. The dissertation mainly focuses on solving printed circuit board (PCB) routing problem that is classified as a combinatorial optimisation problem using reinforcement learning (RL). In this project, the attention model is built to produce routes, and rather than supervised learning, policy gradient is used to train the model. In addition, basical imitation learning method is also implemented to speed up the training process as well as improve the model. Besides, successive halving is used to tune the hyper-parameters.

Dependencies

Python>=3.6
NumPy
SciPy
PyTorch>=1.1
tqdm
tensorboard_logger
Matplotlib

File arrangement

The code is built on top of the attention model that is originally used to solve other problems. As a result, some of the code is no longer used, but still kept. Files regarding the code that are used in this project is to be introduced.

In the base directory, the most important file is run.py, which is called in terminal to do all experiments with different arguments. It is used for both training and validating. train.py contains functions to train all the models, including attention model with REINFORCE and imitation learning models, which is called primarily by run.py.

The copt.so is the Python module provided by Zuken, which is called in other files.

The reinforce_baselines.py contains code that defines the class of baselines, including exponential baseline and rollout baseline.

The options.py contains code to process arguments input from terminal. It is called by run.py.

The successive_halving.py is a file that is used directly rather than calling run.py first. It is used to do hyperparameter tuning using successive halving.

The gen_pcb_offline_train_data.py contains code to produce demos for imitation learning, and validation dataset that is used across many experiments to compare the performance.

The nets package involves the code to build the structure of DNNs. attention_model.py contains the structure of the decoder, and all classes in the file inherit the nn.Module from PyTorch. The structure of the encoder is defined in graph_encoder.py.

The problems package contains the code regarding the problem (i.e. multiple-routing problem). Only the subpackage pcb_route is used in this project. Code in problem_pcb_route.py is used to produce datasets and evaluate solutions. state_pcb_route.py contains code that logs the information regarding the partial solution while in the training process.

The outputs package does not contain the code, but all the trained models that are in the form of .pt file. It contains trained models for all the experiments, thus the results could be reproduced without spending long time to train.

The pre_gen_data package contains the demo data used for imitation learning and validating.

Instructions on Running the Code

To do hyperparameter tuning:

python3 successive_halving.py --n_para_sets 128

Input this code to terminal to do hyperparameter tuning with 128 randomly selected hyperparameter sets.

To run the vanilla attention model on the five-pair problem:

    python run.py --problem PcbRoute --graph_size 5 --baseline rollout 
    --val_size 64 --embedding_dim 128 --n_epochs 150
    --eval_batch_size 1 --lr_decay 0.9573
    --n_encode_layers 4 --penalty_per_node 23111.415
    --run_name 'PcbRoute5_optimized'

where problem PcbRoute is used to indicate that this runs for the multiple-routing problem of this project. graph_size specifies the number of terminal-pair to connect. n_epochs sets the total epochs to train.

To run the attention model with BC on the five-pair problem:

    python run.py --problem PcbRoute --graph_size 5 --baseline rollout
    --val_size 64 --embedding_dim 128
    --hidden_dim 128 --n_epochs 150
    --eval_batch_size 1 --lr_decay 0.9573
    --n_encode_layers 4 --penalty_per_node 23111.415
    --run_name 'PcbRoute5_optimized_bc' --use_BC 1
    --BC_demos_path 'pre_gen_data/pcb_5_5k_bruteforce_data.json' 
    --BC_n_epochs 10 --lr_model_BC 0.001

where use_BC indicates that BC in used in this run. BC_n_epochs specify the number of epochs to pretrain the model using BC. This must be less than or equal to ten, because only 50 k demos are generated.

To run the attention model with DAR on the five-pair problem:

    python run.py --problem PcbRoute --graph_size 5
    --baseline rollout --batch_size 64 --epoch_size 4096
    --val_size 64 --embedding_dim 128 --hidden_dim 128 --n_epochs 150
    --eval_batch_size 1 --lr_decay 0.9573 --n_encode_layers 4
    --penalty_per_node 23111.415 --run_name 'PcbRoute5_optimized_dapg'
    --use_BC_DAPG 1 --BC_demos_path 'pre_gen_data/pcb_5_5k_bruteforce_data.json'
    --BC_n_epochs 10 --lr_model_BC 0.001 --DAPG_actor_ratio 0.8

where the attention model with DAR is indicated by use_BC_DAPG.

To run experiments using normalised input and reward, append the argument --normalize_input_reward 1 to the corresponding command. To run experiments for the eight-pair problem, modifications on graph_size, run_name, and BC_demos_path need to be made. An example is:

    python run.py --problem PcbRoute --graph_size 8 --baseline rollout
    --batch_size 64 --epoch_size 4096 --val_size 64 --embedding_dim 128
    --hidden_dim 128 --n_epochs 110 --eval_batch_size 1 --lr_decay
    0.9573 --n_encode_layers 4 --penalty_per_node 23111.415 --run_name
    'PcbRoute8_optimized_normalize_DAR' --use_BC_DAPG 1 --BC_demos_path
    'pre_gen_data/pcb_8_50k_bruteforce_data_9100.json' --BC_n_epochs 10
    --lr_model_BC 0.001 --normalize_input_reward 1

This code runs experiment using the attention model with DAR and normalisation on the eight-pair problem. To evaluate models using checkpoints of different models, eval_only, load_path, and val_dataset need to be specified:

    python run.py --problem PcbRoute --graph_size 8 --baseline rollout
    --batch_size 64 --epoch_size 4096 --val_size 2048 --embedding_dim
    128 --hidden_dim 128 --n_epochs 110 --eval_batch_size 1 --lr_decay
    0.9573 --n_encode_layers 4 --penalty_per_node 23111.415 --run_name
    'eval' --use_BC_DAPG 1 --BC_demos_path
    'pre_gen_data/pcb_8_50k_bruteforce_data_9100.json' --BC_n_epochs 10
    --lr_model_BC 0.001 --normalize_input_reward 1 --eval_only
    --load_path 'outputs/PcbRoute_8/PcbRoute8_optimized_normalize_DAR_
    20200914T221819/epoch-109.pt' --val_dataset
    'pre_gen_data/pcb_8_3k_validate.json'

ChenYiXuan96 / PCBRoute

Solving combinatorial reinforcement learning problems using RL

Dependencies

File arrangement

Instructions on Running the Code

About

Languages