dreamerv3 reinforcement-learning crafter world-models

Flax Implementation of DreamerV3 on Crafter

TODO

Implement Curiosity Replay (https://arxiv.org/abs/2306.15934)

Modifications to the Original Implementation

Layer normalization: We use the default epsilon value for layer normalization from Flax.
GRU: We adopt the default GRU implementation from Flax.
Adam: We use the default epsilon value for Adam from Flax.
Policy optimizer: We employ a single optimizer for the policy.
DynamicScale: We use the default DynamicScale for FP16 training from Optax.

Installation

pip install --upgrade setuptools==65.5.0 wheel==0.38.4
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install -r requirements.txt
pip install -e .

Training

python train.py --exp_name [exp_name] --seed [seed]

Result

score (10 seeds): 17.65 ± 2.29

About

Flax Implementation of DreamerV3 on Crafter

dreamerv3 reinforcement-learning crafter world-models

Languages

Language:Python 100.0%