Stackelberg RL (Summer 2020)

Goal: to use Stackelberg implicit learning dynamics for reinforcement learning.

Possible algorithms:

Single player environment. Treat the actor (or critic) as a leader in actor-critic algorithms.
Two-players environment, a leader and a follower.
$n$-players environment, a leader and multiple followers.

Possible two-player environments

Linear quadratic games
bimatrix games
gridworld games (e.g. Markov soccer)
bubbleworld games (e.g. particle collision avoidance)
multi-player modified gym environments (e.g. two lunar landers or two cartpoles)
new environment design with 'game theoretic' aspects

To systematically run and record experiments, we will use ExperimentGrid to do grid searches with the following inputs and outputs.

Experiment bench input:

Type of algorithm (simgrad, stackgrad)
Hyperparameters (learning rates, regularization, initialization, network architecture)

Experiment bench output:

Github Repositories:

bchasnov/stackgrad: implements the Stackelberg update
zanedma/reinforcement_lqgame: linear quadratic game environment
JWongDude/FruitLoops: forked from manish-pra/copg, compares stackgrad with competitive gradient descent in RL environments.
singerGUO/gym_multiagent_control: gym environments for games
ratlifflj/2020sumuwreadgrp: summer reading group codebase
fiezt/ICML-2020-Implicit-Stackelberg-Learning: reference code for Stackelberg learning update

bchasnov / rl-game