yuanmingqi / rl-exploration-baselines

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reinforcement Learning Exploration Baselines (RLeXplore)

RLeXplore is a set of implementations of intrinsic reward driven-exploration approaches in reinforcement learning using PyTorch, which can be deployed in arbitrary algorithms in a plug-and-play manner. In particular, RLeXplore is designed to be well compatible with Stable-Baselines3, providing more stable exploration benchmarks.

Notice

This repo has been merged with a new project: https://github.com/RLE-Foundation/rllte, in which more reasonable implementations are provided!

Invoke the intrinsic reward module by:

from rllte.xplore.reward import ICM, RIDE, ...

Module List

Module Remark Repr. Visual Reference
PseudoCounts Count-Based exploration ✔️ ✔️ Never Give Up: Learning Directed Exploration Strategies
ICM Curiosity-driven exploration ✔️ ✔️ Curiosity-Driven Exploration by Self-Supervised Prediction
RND Count-based exploration ✔️ Exploration by Random Network Distillation
GIRM Curiosity-driven exploration ✔️ ✔️ Intrinsic Reward Driven Imitation Learning via Generative Model
NGU Memory-based exploration ✔️ ✔️ Never Give Up: Learning Directed Exploration Strategies
RIDE Procedurally-generated environment ✔️ ✔️ RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
RE3 Entropy Maximization ✔️ State Entropy Maximization with Random Encoders for Efficient Exploration
RISE Entropy Maximization ✔️ Rényi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning
REVD Divergence Maximization ✔️ Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning
  • 🐌: Developing.
  • Repr.: The method involves representation learning.
  • Visual: The method works well in visual RL.

Example

Due to the large differences in the calculation of different intrinsic reward methods, rllte has the following rules:

  1. The environments are assumed to be vectorized;
  2. The compute_irs function of each intrinsic reward module has a mandatory argument samples, which is a dict like:
    • obs (n_steps, n_envs, *obs_shape) <class 'torch.Tensor'>
    • actions (n_steps, n_envs, *action_shape) <class 'torch.Tensor'>
    • rewards (n_steps, n_envs) <class 'torch.Tensor'>
    • next_obs (n_steps, n_envs, *obs_shape) <class 'torch.Tensor'>

Take RE3 for instance, it computes the intrinsic reward for each state based on the Euclidean distance between the state and its $k$-nearest neighbor within a mini-batch. Thus it suffices to provide obs data to compute the reward. The following code provides a usage example of RE3:

from rllte.xplore.reward import RE3
from rllte.env import make_dmc_env
import torch as th

if __name__ == '__main__':
    num_envs = 7
    num_steps = 128
    # create env
    env = make_dmc_env(env_id="cartpole_balance", num_envs=num_envs)
    print(env.observation_space, env.action_space)
    # create RE3 instance
    re3 = RE3(
        observation_space=env.observation_space,
        action_space=env.action_space
    )
    # compute intrinsic rewards
    obs = th.rand(size=(num_steps, num_envs, *env.observation_space.shape))
    intrinsic_rewards = re3.compute_irs(samples={'obs': obs})
    
    print(intrinsic_rewards.shape, type(intrinsic_rewards))
    print(intrinsic_rewards)

# Output:
# {'shape': [9, 84, 84]} {'shape': [1], 'type': 'Box', 'range': [-1.0, 1.0]}
# torch.Size([128, 7]) <class 'torch.Tensor'>

About

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%