Reinforcement Learning Exploration Baselines (RLeXplore)

RLeXplore is a set of implementations of intrinsic reward driven-exploration approaches in reinforcement learning using PyTorch, which can be deployed in arbitrary algorithms in a plug-and-play manner. In particular, RLeXplore is designed to be well compatible with Stable-Baselines3, providing more stable exploration benchmarks.

See Changelog and Implemented Algorithms;
Code test in progress! Welcome to contribute to this program!

Installation

Get the repository with git:

git clone https://github.com/yuanmingqi/rl-exploration-baselines.git

Run the following command to get dependencies:

pip install -r requirements.txt

Usage Example

Due to the large differences in the calculation of different intrinsic reward methods, RLeXplore has the following rules:

In RLeXplore, the environments are assumed to be vectorized;
The compute_irs function of each intrinsic reward module has a mandatory argument rollouts, which is a dict like:

+ observations (n_steps, n_envs, *obs_shape) <class 'numpy.ndarray'>
- actions (n_steps, n_envs, action_shape) <class 'numpy.ndarray'>
+ rewards (n_steps, n_envs, 1) <class 'numpy.ndarray'>

Take RE3 for instance, it computes the intrinsic reward for each state based on the Euclidean distance between the state and its $k$-nearest neighbor within a mini-batch. Thus it suffices to provide observations data to compute the reward. The following code provides a usage example of RE3:

import torch
import numpy as np
from rlexplore.re3 import RE3

if __name__ == '__main__':
    ''' env setup '''
    device = torch.device('cuda:0')
    obs_shape = (4, 84, 84)
    action_shape = 1 # for discrete action space
    n_envs = 16 
    n_steps = 256 
    observations = np.random.randn(
       n_steps, n_envs, *obs_shape).astype('float32') # collected experiences 

    ''' create RE3 instance '''
    re3 = RE3(obs_shape=obs_shape, action_shape=action_shape, device=device,
              latent_dim=128, beta=0.05, kappa=0.00001)

    ''' compute intrinsic rewards '''
    intrinsic_rewards = re3.compute_irs(rollouts={'observations': observations},
        time_steps=25600, k=3, average_entropy=False)

    print(intrinsic_rewards.shape, type(intrinsic_rewards))
    print(intrinsic_rewards)

# Output: (256, 16, 1) <class 'numpy.ndarray'>

Train with Stable-Baselines3 on PyBullet games:

python examples/ppo_re3_bullet.py --action-space cont --env-id AntBulletEnv-v0 --algo ppo --n-envs 10 --exploration re3 --total-time-steps 2000000 --n-steps 128

Implemented Algorithms

Algorithm	Remark	Year	Paper	Code
ICM	Curiosity-driven exploration	2017	Curiosity-Driven Exploration by Self-Supervised Prediction	Link
RND	Count-based exploration	2019	Exploration by Random Network Distillation	Link
GIRM	Curiosity-driven exploration	2020	Intrinsic Reward Driven Imitation Learning via Generative Model	Link
NGU	Memory-based exploration	2020	Never Give Up: Learning Directed Exploration Strategies	Link
RIDE	Procedurally-generated environment	2020	RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments	Link
RE3	Shannon Entropy Maximization	2021	State Entropy Maximization with Random Encoders for Efficient Exploration	Link
RISE	Rényi Entropy Maximization	2022	Rényi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning	Link
REVD	Rényi Divergence Maximization	2022	Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning	Link

Changelog

28/12/2022

Update RE3, RISE, RND, RIDE.
Add a new method entitled REVD.

04/12/2022

Update RND and RIDE.

03/12/2022

We start to reconstruct the project to make it compatible with arbitrary tasks;
Update RE3 and RISE.

27/09/2022

Update the RISE;
Introduce JAX in RISE. See experimental folder.

26/09/2022

Update the RE3;
Try to introduce JAX to accelerate computation. See experimental folder.

Acknowledgments

Some source codes of RLeXplore are built based on the following repositories:

About

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).

MIT License

Languages

Language:Python 99.7%Language:Shell 0.3%