Deep Successor Reinforcement Learning (DSR)

DSR is a hybrid model-free and model-based deep RL algorithm to learn robust value functions. It decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights.

DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy.

Illustration on Doom (VizDoom)

In this environment, the agent's objective is to gather ammo.

Environment (video walkthrough)

Policy after learning

Other illustrations can be found here :

DSR Illustrations

Instructions

To start training see:

./runner.sh

For subgoal discovery using normalized cuts, first pretrain the agent and save the weights. Then change the sample_collect to 1 and netfile to the saved weights file in run_gpu to collect SR samples. After that, run subgoal/subgoal_discovery.m with the appropriate hyperparameters described in the file.

ahavenoname / DSR-1

Deep Successor Reinforcement Learning (DSR)

Illustration on Doom (VizDoom)

Environment (video walkthrough)

Policy after learning

Other illustrations can be found here :

Instructions

Acknowledgements

About

Languages