pnnl / deps_arXiv2020

Differentiable predictive control (DPC) policy optimization examples.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Differentiable Predictive Control

Examples of the differentiable predictive control (DPC) policy optimization algorithm presented in the paper "Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees" https://arxiv.org/abs/2004.11184

DPC combines the principles of model predictive control, reinforcement learning, and differentiable programming to offer a systematic way for offline unsupervised model-based policy optimization with goal-parametrized domain-aware intrinsic rewards.

Method and Examples

methodology.
*Conceptual methodology. Simulation of the differentiable closed-loop system dynamics in the forward pass is followed by backward pass computing direct policy gradients for policy optimization *

methodology_2.
Structural equivalence of DPC architecture with MPC constraints.

cl_trajectories.
Example 1: Closed-loop trajectories of learned stabilizing neural control policy using DPC policy optimization.

cl_trajectories_2.
Example 1: Evolution of the closed-loop trajectories and DPC neural policy during training.

dpc_policy.
Example 1: Landscapes of the learned neural policy via DPC policy optimization algorithm (right) and explicit MPC policy computed using parametric programming solver (left).

example_2.
Example 2: Reference tracking of nonlinear ODE system controlled by DPC neural policy.

example_3.
Example 3: Closed-loop reference tracking control trajectories for the quadcopter model controlled by DPC neural policy.

example_4.
Example 4: Obstacle avoidance with nonlinear constraints via learned DPC neural policy compared to online IPOPT solution.

example_5.
Example 5: Adaptive DPC of unknown linear system subject to disturbances.

example_6_dpc.
Example 6: Closed-loop control trajectories for the PVTOL aircraft model controlled by DPC neural policy.

example_6_ampc.
Example 6: Closed-loop control trajectories for the PVTOL aircraft model controlled by approximate MPC neural policy.

Dependencies

For examples 1, 2, 3, and 4 have been implemented using our newly developed Neuromancer library for learning-based constrained optimization in Pytorch: neuromancer.

See environment.yml to reproduce the Conda environment for running example 5.

Files for Running the Examples

Control Example 1

  • double_integrator_DPC.py - DPC double integrator example using the Neuromancer package
  • double_integrator_eMPC.m - explicit MPC benchmark using MPT3 toolbox

Control Example 2

  • ref_tracking_ODE.py - Reference tracking for a nonlinear ODE
  • ref_tracking_ODE.ref_tracking_ODE.ipynb - jupyter notebook version

Control Example 3

  • quad_3D_linearDPC.py - Reference tracking for a quadcopter model via DPC using the Neuromancer package
  • CVXPY_linearMPC_quadcopter.py - Reference tracking for a quadcopter model online MPC using CVXPY and OSQP solver

Control Example 4

  • 2D_obstacle_avoidance_DPC.py - Parametric obstacle avoidance with nonlinear constraints via DPC using the Neuromancer package
  • 2D_obstacle_avoidance_csadi.py - Online obstacle avoidance using CasADi and IPOPT solver

Control Example 5

  • DeepMPC_sysID_ctrl_sec_2_4.py - policy optimization with ground truth model
  • DeepMPC_sysID_ctrl_sec_2_5.py - adaptive policy optimization via online simultaneous system ID and policy updates
  • DeepMPC_sysID_ctrl_sec_3_7 - computational aspects and scalability analysis

Control Example 6

  • vtol_aircraft_DPC_stabilize.py - Unsupervised DPC policy optimization for VTOL aircraft model using the Neuromancer package
  • vtol_aircraft_aMPC.py - Approximate MPC supervised by online MPC solver
  • pvtol_aircraft_iMPC.m - Online MPC solved in Matlab using Yalmip toolbox and quadprog solver

Cite as

@misc{drgona2022learning,
      title={Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees}, 
      author={Jan Drgona and Aaron Tuor and Draguna Vrabie},
      year={2020},
      eprint={2004.11184},
      archivePrefix={arXiv},
      primaryClass={eess.SY}
}

About

Differentiable predictive control (DPC) policy optimization examples.

License:BSD 2-Clause "Simplified" License


Languages

Language:MATLAB 98.9%Language:Jupyter Notebook 0.5%Language:Python 0.5%