dpbycicle / Study-Reinforcement-Learning

Studying Reinforcement Learning Guide

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Study Reinforcement Learning & (Deep RL) Guide:

  • Simple guide and collective to study RL/DeepRL in one to 2.5 months of time.

Talks to check out first:


  • Introduction to Reinforcement Learning by Joelle Pineau, McGill University:

    • Applications of RL.
    • When to use RL?
    • RL vs supervised learning
    • What is MDP? Markov Decision Process
    • Components of an RL agent:
      • states
      • actions (Probabilistic effects)
      • Reward function
      • Initial state distribution
    • Explanation of the Markov Property:
    • Why Maximizing utility in:
      • Episodic tasks
      • Continuing tasks
        • The discount factor, gamma γ
    • What is the policy & what to do with it?
      • A policy defines the action-selection strategy at every state:
    • Value functions:
      • The value of a policy equations are (two forms of) Bellman’s equation.
      • (This is a dynamic programming algorithm).
      • Iterative Policy Evaluation:
        • Main idea: turn Bellman equations into update rules.
    • Optimal policies and optimal value functions.
      • Finding a good policy: Policy Iteration (Check the talk Below By Peter Abeel)
      • Finding a good policy: Value iteration
        • Asynchronous value iteration:
        • Instead of updating all states on every iteration, focus on important states.
    • Key challenges in RL:
      • Designing the problem domain
        • State representation – Action choice – Cost/reward signal
      • Acquiring data for training – Exploration / exploitation – High cost actions – Time-delayed cost/reward signal
      • Function approximation
      • Validation / confidence measures
    • The RL lingo.
    • In large state spaces: Need approximation:
      • Fitted Q-iteration:
        • Use supervised learning to estimate the Q-function from a batch of training data:
        • Input, Output and Loss.
          • i.e: The Arcade Learning Environment
    • Deep Q-network (DQN) and tips.
  • Deep Reinforcement Learning

    • Why Policy Optimization?
    • Cross Entropy Method (CEM) / Finite Differences / Fixing Random Seed
    • Likelihood Ratio (LR) Policy Gradient
    • Natural Gradient / Trust Regions (-> TRPO)
    • Actor-Critic (-> GAE, A3C)
    • Path Derivatives (PD) (-> DPG, DDPG, SVG)
    • Stochastic Computation Graphs (generalizes LR / PD)
    • Guided Policy Search (GPS)
    • Inverse Reinforcement Learning
      • Inverse RL vs. behavioral cloning

Books:


Courses:


  • Reinforcement Learning by David Silver.

    • Lecture 1: Introduction to Reinforcement Learning
    • Lecture 2: Markov Decision Processes
    • Lecture 3: Planning by Dynamic Programming
    • Lecture 4: Model-Free Prediction
    • Lecture 5: Model-Free Control
    • Lecture 6: Value Function Approximation
    • Lecture 7: Policy Gradient Methods
    • Lecture 8: Integrating Learning and Planning
    • Lecture 9: Exploration and Exploitation
    • Lecture 10: Case Study: RL in Classic Games
  • CS294 Deep Reinforcement Learning by John Schulman and Pieter Abbeel.

    • Lecture 1: intro, derivative free optimization
    • Lecture 2: score function gradient estimation and policy gradients
    • Lecture 3: actor critic methods
    • Lecture 4: trust region and natural gradient methods, open problems

About

Studying Reinforcement Learning Guide