plasticity replay-ratio visual-reinforcement-learning

Adaptive Replay Ratio Implementation on DMC

Revisiting Plasticity in Visual RL: Data, Modules and Training Stages

Guozheng Ma* · Lu Li* · Sen Zhang · Zixuan Liu · Zhen Wang

Yixin Chen · Li Shen · Xueqian Wang · DaCheng Tao

Initially, a low RR is adopted to prevent catastrophic plasticity loss. In later training stages, RR can be raised to boost reuse frequency, as the plasticity dynamics become benign. This balance allows us to sidestep early high RR drawbacks and later harness the enhanced sample efficiency from greater reuse frequency.

Furthermore, the FAU of critic module can be employed adaptively to identify the current training stage. Once the critic’s FAU has recovered to a satisfactory level, it indicates the agent has moved beyond the early training phase prone to catastrophic plasticity loss, allowing for an increase in the RR value.

Setup

Install MuJoCo if it is not already installed:

Obtain a license on the MuJoCo website.
Download MuJoCo binaries here.
Unzip the downloaded archive into ~/.mujoco/mujoco200 and place your license key file mjkey.txt at ~/.mujoco.
Use the env variables MUJOCO_PY_MJKEY_PATH and MUJOCO_PY_MUJOCO_PATH to specify the MuJoCo license key path and the MuJoCo directory path.
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3

Install dependencies:

conda env create -f conda_env.yml
conda activate drqv2

Training Agent

Train DrQ-v2 agent with Adaptive Replay Ratio(Our method):

bash train_adapt_rr.sh

📝 Citation

If this repository is useful to you, please consider citing our paper:

@article{ma2023revisiting,
  title={Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages},
  author={Ma, Guozheng and Li, Lu and Zhang, Sen and Liu, Zixuan and Wang, Zhen and Chen, Yixin and Shen, Li and Wang, Xueqian and Tao, Dacheng},
  journal={arXiv preprint arXiv:2310.07418},
  year={2023}
}

🙏 Acknowledgements

We would like to thank Denis Yarats for open-sourcing the DrQv2 codebase. Our implementation builds on top of their repository.

About

[ICLR 2024] Adaptive Replay Ratio implementation from 'Revisiting Plasticity in Visual RL: Data, Modules and Training Stages'.

https://arxiv.org/abs/2310.07418

plasticity replay-ratio visual-reinforcement-learning

Languages

Language:Python 99.6%Language:Shell 0.4%