Itomigna2 / Muesli-cartpole

Simple Muesli RL algorithm implementation (PyTorch)

0200

cartpole-v1 colab deep-learning model-based-rl muesli muzero reinforcement-learning

Muesli-cartpole

This repository is deprecated. I am working now on https://github.com/Itomigna2/Muesli-lunarlander

Links

Colab demo link : https://colab.research.google.com/drive/19qTIgLvevkc5TA9zNjaS5lILWofGvZPJ?usp=sharing

Muesli paper link : https://arxiv.org/abs/2104.06159

CartPole-v1 env document : https://www.gymlibrary.dev/environments/classic_control/cart_pole/

Implemented

MuZero network
5 step unroll
L_pg+cmpo
L_v
L_r
L_m (5 step)
Stacking 8 observations
Mini-batch update
Hidden state scaled within [-1,1]
Gradient clipping by value [-1,1]
Dynamics network gradient scale 1/2
Target network(prior parameters) moving average update
Categorical representation (value, reward model)
Normalized advantage
Tensorboard monitoring

Differences from paper

self play follow main network inferenced policy (originally follow target network)

Memo

This code(.ipynb) is executable in Google Colab. Requirements.txt is from Colab CPU compute backend.

About

Simple Muesli RL algorithm implementation (PyTorch)

cartpole-v1 colab deep-learning model-based-rl muesli muzero reinforcement-learning

Languages

Language:Jupyter Notebook 100.0%

Links

ProductDiscover

Data Powerby api.github.com. Remove your profile on the Giters? Go to settings.

Contact Site Admin: Giters.