DeepHaeJoong / rl-exploration-baselines

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reinforcement Learning Exploration Baselines (RLeXplore)

RLeXplore is a set of implementations of intrinsic reward driven-exploration approaches in reinforcement learning using PyTorch, which can be deployed in arbitrary algorithms in a plug-and-play manner. In particular, RLeXplore is designed to be well compatible with Stable-Baselines3, providing more stable exploration benchmarks.

Installation

  • Get the repository with git:
git clone https://github.com/yuanmingqi/rl-exploration-baselines.git
  • Run the following command to get dependencies:
pip install -r requirements.txt

Usage Example

Due to the large differences in the calculation of different intrinsic reward methods, RLeXplore has the following rules:

  1. In RLeXplore, the environments are assumed to be vectorized;
  2. The compute_irs function of each intrinsic reward module has a mandatory argument rollouts, which is a dict like:
+ observations (n_steps, n_envs, *obs_shape) <class 'numpy.ndarray'>
- actions (n_steps, n_envs, action_shape) <class 'numpy.ndarray'>
+ rewards (n_steps, n_envs, 1) <class 'numpy.ndarray'>

Take RE3 for instance, it computes the intrinsic reward for each state based on the Euclidean distance between the state and its $k$-nearest neighbor within a mini-batch. Thus it suffices to provide observations data to compute the reward. The following code provides a usage example of RE3:

import torch
import numpy as np
from rlexplore.re3 import RE3

if __name__ == '__main__':
    ''' env setup '''
    device = torch.device('cuda:0')
    obs_shape = (4, 84, 84)
    action_shape = 1 # for discrete action space
    n_envs = 16 
    n_steps = 256 
    observations = np.random.randn(
       n_steps, n_envs, *obs_shape).astype('float32') # collected experiences 

    ''' create RE3 instance '''
    re3 = RE3(obs_shape=obs_shape, action_shape=action_shape, device=device,
              latent_dim=128, beta=0.05, kappa=0.00001)

    ''' compute intrinsic rewards '''
    intrinsic_rewards = re3.compute_irs(rollouts={'observations': observations},
        time_steps=25600, k=3, average_entropy=False)

    print(intrinsic_rewards.shape, type(intrinsic_rewards))
    print(intrinsic_rewards)

# Output: (256, 16, 1) <class 'numpy.ndarray'>

Train with Stable-Baselines3 on PyBullet games:

python examples/ppo_re3_bullet.py --action-space cont --env-id AntBulletEnv-v0 --algo ppo --n-envs 10 --exploration re3 --total-time-steps 2000000 --n-steps 128

Implemented Algorithms

Algorithm Remark Year Paper Code
ICM Curiosity-driven exploration 2017 Curiosity-Driven Exploration by Self-Supervised Prediction Link
RND Count-based exploration 2019 Exploration by Random Network Distillation Link
GIRM Curiosity-driven exploration 2020 Intrinsic Reward Driven Imitation Learning via Generative Model Link
NGU Memory-based exploration 2020 Never Give Up: Learning Directed Exploration Strategies Link
RIDE Procedurally-generated environment 2020 RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments Link
RE3 Shannon Entropy Maximization 2021 State Entropy Maximization with Random Encoders for Efficient Exploration Link
RISE Rényi Entropy Maximization 2022 Rényi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning Link
REVD Rényi Divergence Maximization 2022 Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning Link

Changelog

28/12/2022

  • Update RE3, RISE, RND, RIDE.
  • Add a new method entitled REVD.

04/12/2022

  • Update RND and RIDE.

03/12/2022

  • We start to reconstruct the project to make it compatible with arbitrary tasks;
  • Update RE3 and RISE.

27/09/2022

  • Update the RISE;
  • Introduce JAX in RISE. See experimental folder.

26/09/2022

  • Update the RE3;
  • Try to introduce JAX to accelerate computation. See experimental folder.

Acknowledgments

Some source codes of RLeXplore are built based on the following repositories:

About

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%