Differentiable Predictive Control

Examples of the differentiable predictive control (DPC) policy optimization algorithm presented in the paper "Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees" https://arxiv.org/abs/2004.11184

DPC combines the principles of model predictive control, reinforcement learning, and differentiable programming to offer a systematic way for offline unsupervised model-based policy optimization with goal-parametrized domain-aware intrinsic rewards.

Method and Examples

*Conceptual methodology. Simulation of the differentiable closed-loop system dynamics in the forward pass is followed by backward pass computing direct policy gradients for policy optimization *

Structural equivalence of DPC architecture with MPC constraints.

Example 1: Closed-loop trajectories of learned stabilizing neural control policy using DPC policy optimization.

Example 1: Evolution of the closed-loop trajectories and DPC neural policy during training.

Example 1: Landscapes of the learned neural policy via DPC policy optimization algorithm (right) and explicit MPC policy computed using parametric programming solver (left).

Example 2: Reference tracking of nonlinear ODE system controlled by DPC neural policy.

Example 3: Closed-loop reference tracking control trajectories for the quadcopter model controlled by DPC neural policy.

Example 4: Obstacle avoidance with nonlinear constraints via learned DPC neural policy compared to online IPOPT solution.

Example 5: Adaptive DPC of unknown linear system subject to disturbances.

Example 6: Closed-loop control trajectories for the PVTOL aircraft model controlled by DPC neural policy.

Example 6: Closed-loop control trajectories for the PVTOL aircraft model controlled by approximate MPC neural policy.

Dependencies

For examples 1, 2, 3, and 4 have been implemented using our newly developed Neuromancer library for learning-based constrained optimization in Pytorch: neuromancer.

See environment.yml to reproduce the Conda environment for running example 5.

Files for Running the Examples

Control Example 1

double_integrator_DPC.py - DPC double integrator example using the Neuromancer package
double_integrator_eMPC.m - explicit MPC benchmark using MPT3 toolbox

Control Example 2

ref_tracking_ODE.py - Reference tracking for a nonlinear ODE
ref_tracking_ODE.ref_tracking_ODE.ipynb - jupyter notebook version

Control Example 3

quad_3D_linearDPC.py - Reference tracking for a quadcopter model via DPC using the Neuromancer package
CVXPY_linearMPC_quadcopter.py - Reference tracking for a quadcopter model online MPC using CVXPY and OSQP solver

Control Example 4

2D_obstacle_avoidance_DPC.py - Parametric obstacle avoidance with nonlinear constraints via DPC using the Neuromancer package
2D_obstacle_avoidance_csadi.py - Online obstacle avoidance using CasADi and IPOPT solver

Control Example 5

DeepMPC_sysID_ctrl_sec_2_4.py - policy optimization with ground truth model
DeepMPC_sysID_ctrl_sec_2_5.py - adaptive policy optimization via online simultaneous system ID and policy updates
DeepMPC_sysID_ctrl_sec_3_7 - computational aspects and scalability analysis

Control Example 6

vtol_aircraft_DPC_stabilize.py - Unsupervised DPC policy optimization for VTOL aircraft model using the Neuromancer package
vtol_aircraft_aMPC.py - Approximate MPC supervised by online MPC solver
pvtol_aircraft_iMPC.m - Online MPC solved in Matlab using Yalmip toolbox and quadprog solver

Cite as

@misc{drgona2022learning,
      title={Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees}, 
      author={Jan Drgona and Aaron Tuor and Draguna Vrabie},
      year={2020},
      eprint={2004.11184},
      archivePrefix={arXiv},
      primaryClass={eess.SY}
}

pnnl / deps_arXiv2020