Goal: to use Stackelberg implicit learning dynamics for reinforcement learning.
Possible algorithms:
- Single player environment. Treat the actor (or critic) as a leader in actor-critic algorithms.
- Two-players environment, a leader and a follower.
-
$n$ -players environment, a leader and multiple followers.
Possible two-player environments
- Linear quadratic games
- bimatrix games
- gridworld games (e.g. Markov soccer)
- bubbleworld games (e.g. particle collision avoidance)
- multi-player modified gym environments (e.g. two lunar landers or two cartpoles)
- new environment design with 'game theoretic' aspects
To systematically run and record experiments, we will use ExperimentGrid to do grid searches with the following inputs and outputs.
Experiment bench input:
- Type of algorithm (simgrad, stackgrad)
- Hyperparameters (learning rates, regularization, initialization, network architecture)
Experiment bench output:
- Convergence plots of costs and first-order derivatives
- Eigenvalues of second order derivatives
- Checkpoints
Github Repositories:
bchasnov/stackgrad
: implements the Stackelberg updatezanedma/reinforcement_lqgame
: linear quadratic game environmentJWongDude/FruitLoops
: forked frommanish-pra/copg
, compares stackgrad with competitive gradient descent in RL environments.singerGUO/gym_multiagent_control
: gym environments for gamesratlifflj/2020sumuwreadgrp
: summer reading group codebasefiezt/ICML-2020-Implicit-Stackelberg-Learning
: reference code for Stackelberg learning update