btx0424 / Intrinsic-Motivations-RL

This repo collects notable works on intrisic motivations in RL and unsupervised RL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intrinsic-Motivations-RL

A non-exhuastive collection of papers regarding intrisic motivations and unsupervised RL.

Why are they useful?

  • Extrinsic rewards can sometimes be sparse or very difficult to design, making it hard for the agent to efficiently learn about the environment and how to achieve an objective.
  • Some of them may enable agents to discover meaningful behavior without external supervision.
  • They help us to understand the generalization of a model.

Papers

  1. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, 2015
    • maximize the MI between an action sequence and the resulting state given the current state
  2. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2015
    • reward novelty as measured by (encoded) state prediction error
  3. Curiosity-driven Exploration by Self-supervised Prediction, 2017
    • reward novelty as measured by (encoded) state prediction error
    • learn what is controllabe/relevant through an inverse dynamics model
  4. VIME: Variational Information Maximizing Exploration, 2017
    • reward the IG (in dynamics model through observing new transitions) using BNN
  5. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning, 2017
    • reward the suprise of seeing the state that is transitioned to
  6. Model-Based Active Exploration, 2019
    • reward the IG (in dynamics model through observing new transitions) measured by JSD of an ensemble of dynamics model
  7. Large-Scale Study of Curiosity-Driven Learning, 2018
    • detailed study on practical considerations
  8. Unsupervised Control Through Non-Parametric Discriminative Rewards, 2019
    • goal-conditioned policy with a "goal achievement reward"
  9. Diversity is All You Need: Learning Skills without a Reward Function, 2019
    • learn distinguishable and diverse skills by learning to infer skill from states
  10. Mutual Information State Intrinsic Control, ICLR2021
    • maximize the MI between surrounding state and agent state
  11. NovelD: A Simple yet Effective Exploration Criterion, NeurIPS2021
    • reward the increase in novelty (measured by RND) to achieve BFS-like exploration
  12. SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments, ICLR2021
    • minimize the state entropy
  13. Information is Power: Intrinsic Control via Information Capture, NeurIPS2021
    • minimize the state visitation entropy in a partially observable setting

Comparison

  • exploration: state means a wide coverage of the state space, behavior means meaningful action sequences that lead to some states.
Motivation Dynamics Model-Free Scope Exploration
ICM[3] novelty global state
VIME[4] information global state
DIAYN[9] skill global behavior
MUSIC[10] control global behavior
NovelID[11] novelty difference both state
SMiRL[12] certainty in state episodic behavior
IC2[13] certainty in state visitation episodic behavior

About

This repo collects notable works on intrisic motivations in RL and unsupervised RL.