bchasnov / rl-game

summer 2020 rl game

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stackelberg RL (Summer 2020)

Goal: to use Stackelberg implicit learning dynamics for reinforcement learning.

Possible algorithms:

  1. Single player environment. Treat the actor (or critic) as a leader in actor-critic algorithms.
  2. Two-players environment, a leader and a follower.
  3. $n$-players environment, a leader and multiple followers.

Possible two-player environments

  1. Linear quadratic games
  2. bimatrix games
  3. gridworld games (e.g. Markov soccer)
  4. bubbleworld games (e.g. particle collision avoidance)
  5. multi-player modified gym environments (e.g. two lunar landers or two cartpoles)
  6. new environment design with 'game theoretic' aspects

To systematically run and record experiments, we will use ExperimentGrid to do grid searches with the following inputs and outputs.

Experiment bench input:

  1. Type of algorithm (simgrad, stackgrad)
  2. Hyperparameters (learning rates, regularization, initialization, network architecture)

Experiment bench output:

  1. Convergence plots of costs and first-order derivatives
  2. Eigenvalues of second order derivatives
  3. Checkpoints

Github Repositories:

About

summer 2020 rl game


Languages

Language:TeX 100.0%