janEbert / sdc-gym

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sdc-gym

Differentiable Programming RL

The main script is ./dp_playground.py. For command line arguments, see the according parse_args function.

Modify the build_model function according to the desired architecture. Adjusting the learning rate schedule in the build_opt function may also prove worthwhile.

Training

Standard Training

While the best models are obtained after multiple learning rate waves, training may also be stopped after 30000 steps. An example training command looks like this:

python dp_playground.py --M 5 --steps 200000 --batch_size 32 \
       --lambda_real_interval -100 0 --lambda_imag_interval -10 0

Continuing Training

Training can be continued using the --model_path argument. Simply load a model checkpoint and continue as desired. Note that the optimizer state is not included in saved checkpoints, so it is recommended to schedule the learning rate so training only starts after the optimizer has adapted a bit (when using an adaptive one). Example usage:

python dp_playground.py --M 5 --steps 200000 --batch_size 32 \
       --lambda_real_interval -100 0 --lambda_imag_interval -10 0 \
       --model_path best_dp_model_diag_M_5_re_-100.0_0.0_im_-10.0_0.0_loss_[...].npy

Preconditioner, Input, and Loss Types

The structure of the preconditioner, the model's input as well as the loss function used for training may be changed with their corresponding arguments. We will describe each of them in detail here.

Preconditioners

The --prec_type argument specifies which preconditioner to use. All nonzero values of the resulting matrix will be optimized. The default is diag.

prec_type Description
diag Diagonal matrix
lower_diag Diagonal matrix with the diagonal lowered by an offset of 1
lower_tri Lower triangular matrix
strictly_lower_tri Strictly lower triangular matrix
Inputs

The --input_type argument specifies which inputs to give the model. The default is lambda.

input_type Description
lambda Only λ
residual Initial residual (of the initial guess in relation to u0)
lambda_u λ and the initial guess
f f(u) = λu
num_iters Number of iteration steps already taken
Losses

The --loss_type argument specifies which loss function to use for training the model. The default is spectral_radius.

loss_type Description
spectral_radius Minimize the spectral radius of the iteration matrix
residual Minimize the residual after a fixed number of iteration steps

To optimize the residual after 10 steps, handling multiple u_init:

python dp_playground.py --steps 200000 --batch_size 32 \
       --lambda_real_interval -100 -100 --u_real_interval -1 1 \
       --input_type f --loss_type residual --num_iters 10

Optimizing Parameters Directly

To obtain a model that optimizes the preconditioner's parameters directly, you would give the argument --optimize_directly True.

An example training to optimize for a single λ value with M = 5 would be started like this. Note also that we set the batch size to 1 to avoid redundant work:

python dp_playground.py --M 5 --steps 200000 --optimize_directly True \
       --batch_size 1 --lambda_real_interval -1 -1 --lambda_imag_interval 0 0

To optimize a strictly lower triangular preconditioner, testing on some additional preconditioners:

python dp_playground.py --steps 200000 --optimize_directly True \
       --batch_size 1 --prec_type strictly_lower_tri --extensive_tests True \
       --lambda_real_interval -1 -1 --lambda_imag_interval 0 0

We can also optimize one diagonal for each iteration step, handling multiple u_init:

python dp_playground.py --steps 200000 --optimize_directly True \
       --batch_size 32 --lambda_real_interval -100 -100 \
       --u_real_interval -1 1 --num_iters 10 --input_type num_iters

Evaluation

Using Training Script for Evaluation Only

Simply set the number of training steps to 0 (--steps 0) – the (possibly loaded) model will be used as is.

Sharing Models

The files to share when sharing a model for continued training are those ending in .npy, .structure and .steps. When only interested in interference, the file ending in .steps does not need to be shared.

Reinforcement Learning

The main script is ./rl_playground.py. For command line arguments, see ./utils/arguments.py. There are some recommended defaults to set below.

The command line arguments given are automatically saved upon script start. Most files saved are stored with the starting time of the script as a timestamp, so all files for one experiment should be immediately recognizable by having the same timestamp (except for TensorBoard logs for now).

Recommended arguments

python rl_playground.py --envname sdc-v1 --num_envs 8 \
       --model_class PPG --activation_fn ReLU \
       --collect_states True --reward_iteration_only False --norm_obs True

Another recommendation

To accelerate learning, increase the batch size if possible. Here is an example for PPG:

PPG has a default batch size of 64 (given as the keyword argument batch_size), so we could use a batch size of 512 like this:

python rl_playground.py --model_class PPG \
       --model_kwargs '{"batch_size": 512}'

This will, however, most likely harm your training success due to executing fewer training steps, as we process much more data (and thus more environmental timesteps) in each training step. A good heuristic to help against this problem is scaling the learning rate proportionally to the batch size. The default learning rate we give is 25e-5, so scaling it to the increased batch size is 25e-5 * 512 / 64 = 0.002. Our new command for starting the script becomes the following:

python rl_playground.py --model_class PPG --learning_rate 0.002 \
       --model_kwargs '{"batch_size": 512}'

For PPG, keep in mind that it also uses an auxiliary batch size (aux_batch_size)! Half of the normal batch size is a good starting value for this. The final command is:

python rl_playground.py --model_class PPG --learning_rate 0.002 \
       --model_kwargs '{"batch_size": 512, "aux_batch_size": 256}'

About


Languages

Language:Python 100.0%