[Paper] [Installation] [Usage] [Mortar Mayhem] [Mystery Path] [Searing Spotlights] [Training]

Memory Gym: Partially Observable Challenges for Memory-Based Agents

Memory Gym features the environments Mortar Mayhem, Mystery Path, and Searing Spotlights that are inspired by some mini games of Pummel Party. These environments shall benchmark an agent's memory to

memorize events across long sequences,
generalize,
and be robust to noise.

Citation

@inproceedings{pleines2023memory,
title={Memory Gym: Partially Observable Challenges to Memory-Based Agents},
author={Marco Pleines and Matthias Pallasch and Frank Zimmer and Mike Preuss},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=jHc8dCx6DDr}
}

Installation

Major dependencies:

gymnasium==0.26.3
PyGame

conda create -n memory-gym python=3.7 --yes
conda activate memory-gym
pip install memory-gym

conda create -n memory-gym python=3.7 --yes
conda activate memory-gym
git clone https://github.com/MarcoMeter/drl-memory-gym.git
cd drl-memory-gym
pip install -e .

Usage

Executing the environment using random actions:

import memory_gym
import gymnasium as gym

env = gym.make("SearingSpotlights-v0")
# env = gym.make("MortarMayhem-v0")
# env = gym.make("MortarMayhem-Grid-v0")
# env = gym.make("MortarMayhemB-v0")
# env = gym.make("MortarMayhemB-Grid-v0")
# env = gym.make("MysteryPath-v0")
# env = gym.make("MysteryPath-Grid-v0")

# Pass reset parameters to the environment
options = {"agent_scale": 0.25}

obs, info = env.reset(seed=1, options=options)
done = False
while not done:
    obs, reward, done, truncation, info = env.step(env.action_space.sample())

print(info)

Manually play the environments using the console scripts (works only using an anaconda environment):

mortar_mayhem
# MMAct
mortar_mayhem_b
# MMGrid
mortar_mayhem_grid
# MMAct Grid
mortar_mayhem_b_grid
mystery_path
mystery_path_grid
searing_spotlights

You can also execute the python scripts directly, for example:

python ./memory_gym/mortar_mayhem.py

Controls:

WASD or Arrow Keys to move or rotate
Page Up / Page Down to increment / decrement environment seeds

Mortar Mayhem

Mortar Mayhem challenges the agent with a sequence of commands that the agent has to memorize and execute in the right order. During the beginning of the episode, each command is visualized one by one. Mortar Mayhem can be reduced to solely executing commands. In this case, the command sequence is always available as vector observation (one-hot encoded) and, therefore, is not visualized.

The max length of an episode can be calculated as follows:

max episode length = (command_show_duration + command_show_delay) * command_count + (explosion_delay + explosion_duration) * command_count - 2

Reset Parameters

Parameter	Default	Description
agent_scale	0.25	The dimensions of the agent.
agent_speed	2.5	The speed of the agent.
arena_size	5	The grid dimension of the arena (min: 2, max: 6)
allowed_commands	9	Available commands: right, down, left, up, stay, right down, right up, left down, left up. If set to five, the first five commands are available.
command_count	[5]	The number of commands that are asked to be executed by the agent. This is a list that the environment samples from.
command_show_duration	[3]	The number of steps that one command is shown. This is a list that the environment samples from.
command_show_delay	[1]	The number of steps between showing one command. This is a list that the environment samples from.
explosion_duration	[6]	The number of steps that an agent has to stay on the commanded tile. This is a list that the environment samples form.
explosion_delay	[18]	The entire duration in steps that the agent has to execute the current command. This is a list that the environments samples from.
visual_feedback	True	Whether to turn off the visualization of the feedback. Upon command evaluation, the wrong tiles are rendered red.
reward_command_failure	0.0	What reward to signal upon failing at the current command.
reward_command_success	0.1	What reward to signal upon succeeding at the current command.
reward_episode_success	0.0	What reward to signal if the entire command sequence is successfully solved by the agent.

Mystery Path

Mystery Path procedurally generates an invisible path for the agent to cross from the origin to the goal. Per default, only the origin of the path is visible. Upon falling off the path, the agent has to restart from the origin. Note that the episode is not terminated by falling off. Hence, the agent has to memorize where it fell off and where it did not.

Reset Parameters

Parameter	Default	Explanation
max_steps	512	The maximum number of steps for the agent to play one episode.
agent_scale	0.25	The dimensions of the agent.
agent_speed	2.5	The speed of the agent.
cardinal_origin_choice	[0, 1, 2, 3]	Allowed cardinal directions for the path generation to place the origin. This is a list that the environment samples from.
show_origin	True	Whether to hide or show the origin tile of the generated path.
show_goal	False	Whether to hide or show the goal tile of the generated path.
visual_feedback	True	Whether to visualize that the agent is off the path. A red cross is rendered on top of the agent.
reward_goal	1.0	What reward to signal when reaching the goal tile.
reward_fall_off	0.0	What reward to signal when falling off.
reward_path_progress	0.0	What reward to signal when making progress on the path. This is only signaled for reaching another tile for the first time.
reward_step	0.0	What reward to signal for each step.

Searing Spotlights

Searing Spotlights is a pitch black surrounding to the agent. The environment is initially fully observable but the light is dimmed untill off during the first few frames. Only randomly moving spotlights unveil information on the environment's ground truth, while posing a threat to the agent. If spotted by spotlight, the agent looses health points. While the agent must avoid closing in spotlights, it further has to collect coins. After collecting all coins, the agent has to take the environment's exit.

Reset Parameters

Parameter	Default	Explanation
max_steps	512	The maximum number of steps for the agent to play one episode.
agent_scale	0.25	The dimensions of the agent.
agent_speed	2.5	The speed of the agent.
agent_health	100	The initial health points of the agent.
agent_visible	False	Whether to make the agent permanently visible.
sample_agent_position	True	Whether to hide or show the goal tile of the generated path.
num_coins	[1]	The number of coins that are spawned. This is a list that the environment samples from.
coin_scale	0.375	The scale of the coins.
coins_visible	False	Whether to make the coins permanently visible.
use_exit	True	Whether to spawn and use the exit task. The exit is accessible by the agent after collecting all coins.
exit_scale	0.0	The scale of the exit.
exit_visible	False	Whether to make the exit permanently visible.
initial_spawns	4	The number of spotlights that are initially spawned.
num_spawns	30	The number of spotlights that are to be spawned.
initial_spawn_interval	30	The number of steps until the next spotlight is spawned.
spawn_interval_threshold	10	The spawn interval is decayed until reaching this lower threshold.
spawn_interval_decay	0.95	The decay rate of the spotlight spawn interval.
spot_min_radius	7.5	The minimum radius of the spotlights. The radius is sampled from the range min to max.
spot_max_radius	13.75	The maximum radius of the spotlights. The radius is sampled from the range min to max.
spot_min_speed	0.0025	The minimum speed of the spotlights. The speed is sampled from the range min to max.
spot_max_speed	0.0075	The maximum speed of the spotlights. The speed is sampled from the range min to max.
spot_damage	1.0	Damage per step while the agent is spotted by one spotlight.
light_dim_off_duration	6	The number of steps to dim off the global light.
light_threshold	255	The threshold for dimming the global light. A value of 255 indicates that the light will dimmed of completely.
visual_feedback	True	Whether to render the tiled background red if the agent is spotted.
black_background	False	Whether to render the environments background black, while the spotlights are rendered as white circumferences.
hide_chessboard	False	Whether to hide the chessboard background. This renders the background of the environment white.
reward_inside_spotlight	0.0	What reward to signal for each step while being inside a spotlight.
reward_outside_spotlight	0.0	What reward to signal for each step while being outside of a spotlight.
reward_death	0.0	What reward to signal upon losing all health points.
reward_exit	1.0	What reward to signal after successfully using the exit.
reward_max_steps	0.0	What reward to signal if max steps is reached.
reward_coin	0.25	What reward to signal upon collecting one coin.

Training

Baseline results are avaible via these repositories.

Recurrence + PPO

Gated TransformerXL + PPO

MarcoMeter / drl-memory-gym

Memory Gym: Partially Observable Challenges for Memory-Based Agents

Citation

Installation

Usage

Mortar Mayhem

Reset Parameters

Mystery Path

Reset Parameters

Searing Spotlights

Reset Parameters

Training

About

Languages