Itomigna2 / Muesli-cartpole

Simple Muesli RL algorithm implementation (PyTorch)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Muesli-cartpole

This repository is deprecated. I am working now on https://github.com/Itomigna2/Muesli-lunarlander

Links

Colab demo link : https://colab.research.google.com/drive/19qTIgLvevkc5TA9zNjaS5lILWofGvZPJ?usp=sharing

Muesli paper link : https://arxiv.org/abs/2104.06159

CartPole-v1 env document : https://www.gymlibrary.dev/environments/classic_control/cart_pole/

Implemented

  • MuZero network
  • 5 step unroll
  • L_pg+cmpo
  • L_v
  • L_r
  • L_m (5 step)
  • Stacking 8 observations
  • Mini-batch update
  • Hidden state scaled within [-1,1]
  • Gradient clipping by value [-1,1]
  • Dynamics network gradient scale 1/2
  • Target network(prior parameters) moving average update
  • Categorical representation (value, reward model)
  • Normalized advantage
  • Tensorboard monitoring

Differences from paper

  • self play follow main network inferenced policy (originally follow target network)

Memo

This code(.ipynb) is executable in Google Colab. Requirements.txt is from Colab CPU compute backend.

About

Simple Muesli RL algorithm implementation (PyTorch)


Languages

Language:Jupyter Notebook 100.0%