EfficientZero (NeurIPS 2021)

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Environments

EfficientZero requires python3 (>=3.6) and pytorch (>=1.8.0) with the development headers.

We recommend to use torch amp (--amp_type torch_amp) to accelerate training.

Prerequisites

Before starting training, you need to build the c++/cython style external packages. (GCC version 7.5+ is required.)

cd core/ctree
bash make.sh

The distributed framework of this codebase is built on ray.

Installation

As for other packages required for this codebase, please run pip install -r requirements.txt.

Usage

Quick start

Train: python main.py --env BreakoutNoFrameskip-v4 --case atari --opr train --amp_type torch_amp --num_gpus 1 --num_cpus 10 --cpu_actor 1 --gpu_actor 1 --force
Test: python main.py --env BreakoutNoFrameskip-v4 --case atari --opr test --amp_type torch_amp --num_gpus 1 --load_model --model_path model.p \

Bash file

We provide train.sh and test.sh for training and evaluation.

Train:
- With 4 GPUs (3090): bash train.sh
Test: bash test.sh

Required Arguments	Description
`--env`	Name of the environment
`--case {atari}`	It's used for switching between different domains(default: atari)
`--opr {train,test}`	select the operation to be performed
`--amp_type {torch_amp,none}`	use torch amp for acceleration

Other Arguments	Description
`--force`	will rewrite the result directory
`--num_gpus 4`	how many GPUs are available
`--num_cpus 96`	how many CPUs are available
`--cpu_actor 14`	how many cpu workers
`--gpu_actor 20`	how many gpu workers
`--seed 0`	the seed
`--use_priority`	use priority in replay buffer sampling
`--use_max_priority`	use the max priority for the newly collectted data
`--amp_type 'torch_amp'`	use torch amp for acceleration
`--info 'EZ-V0'`	some tags for you experiments
`--p_mcts_num 8`	set the parallel number of envs in self-play
`--revisit_policy_search_rate 0.99`	set the rate of reanalyzing policies
`--use_root_value`	use root values in value targets (require more GPU actors)
`--render`	render in evaluation
`--save_video`	save videos for evaluation

Architecture Designs

The architecture of the training pipeline is shown as follows:

Some suggestions

To use a smaller model, you can choose smaller dim of the projection layers (Eg: 256/64) and the LSTM hidden layer (Eg: 64) in the config.
For GPUs with 10G memory instead of 20G memory, you can allocate 0.25 gpu for each GPU maker (@ray.remote(num_gpus=0.25)) in core/reanalyze_worker.py.

New environment registration

If you wan to apply EfficientZero to a new environment like mujoco. Here are the steps for registration:

Follow the directory config/atari and create dir for the env at config/mujoco.
Implement your MujocoConfig(BaseConfig) class and implement the models as well as your environment wrapper.
Register the case at main.py.

Results

Evaluation with 32 seeds for 3 different runs (different seeds).

Citation

If you find this repo useful, please cite our paper:

@inproceedings{ye2021mastering,
  title={Mastering Atari Games with Limited Data},
  author={Weirui Ye, and Shaohuai Liu, and Thanard Kurutach, and Pieter Abbeel, and Yang Gao},
  booktitle={NeurIPS},
  year={2021}
}

Contact

If you have any question or want to use the code, please contact ywr20@mails.tsinghua.edu.cn .

Acknowledgement

We appreciate the following github repos a lot for their valuable code base implementations:

https://github.com/koulanurag/muzero-pytorch

https://github.com/werner-duvaud/muzero-general

https://github.com/pytorch/ELF

YeWR / EfficientZero