Memoire

Memoire (pronounced "mem-wah-r") is a distributed replay memory for reinforcement learning. Industrial applications of reinforcement learning usually require large amount of computation, both for environment exploration and neural network training. Our goal is to make it easier to write high-performance distributed reinforcement learning algorithm.

How it works

The distributed reinforcement learning platform consists of two types of workers: Actors and Learners.

An actor is responsible for exploring the environment and generating data for the learners. In its main-loop, it works as

Get latest model(policy) from learners.
Act in the environment according to the current policy.
Put generated experience in the client side of replay memory.
Client push samples to the server.

An learner is responsible for updating the model with batch data. In its main-loop, it works as

Get batch of samples from the server side of replay memory.
Update model with batch samples, according to different algorithms.
Publish latest model to actors.

We can distribute actors and learners in clusters (CPU and GPU) to fully utilize heterogeneous computing resources.

	Actor	Learner
Computing resource	CPU	GPU
DNN operation	Forward	F/B
Numbers	~300	~10
Memory usage	~10G	~1G
Bandwidth usage	~1G	~20G

The client side of the replay memory stores recent trajectories generated by the local actor. The size of local replay memory is limited by total steps/transitions AND total episodes. We provide 3 methods to create space for a new episode, add a transition to the current episode, and close a terminated episode. When an episode is closed, the TD-lambda return for each step is calculated automatically, and its priority for sampling is updated. We also provide a method to sample current trajectories to form a cache, and push it to the learner.

The server side receives pushed caches from clients automatically. When we need batch of samples for training, we can get a batch from these pushed caches with another phase of sampling. The two phases of sampling at the client side and the server side is designed to be (roughly) equivalent to sampling from the whole replay memory across actors.

A complete list of supported methods and options can be found at API page.

Note that in this framework, only the sampled transitions instead of all the generated trajectories, are pushed to the learner for model updating. In the case that we have enormous number of actors, this kind of design can decrease both the bandwidth burden and memory usage of the learner. At the same time, the learner can still get the sample with high priority, and update the model efficiently by the flavor of prioritized sampling.

Features

Prioritized Sampling

Prioritized experience replay [1] is a method of selecting high-priority samples for training. It is arguably the most effective technique for good performance of (distributed) reinforcement learning [2] [3].
Framework Independence

The replay memory module is separated from the training of neural network, thus making it independent of the deep learning framework used to implement the neural network (e.g. TensorFlow, PyTorch, etc.). We hope the modular design can provide more flexibility for deep learning practitioners.
Frame Stacking, N-Step Learning, Multidimensional Reward, TD-lambda return computing.

These are common and useful components of reinforcement learning in practice, which are implemented in our module for convenience.

Usage

See example/

Build

The module is based on pybind11. We recommend to clone the latest version of pybind11 from github, and set the PYBIND11_ROOT properly in Makefile. We use the version 2.3.dev0.

pip uninstall pybind11       # Remove old version
cd pybind11
pip install -e .             # Install from source

or install from github (recommended)

pip install git+https://github.com/pybind/pybind11.git

We also use new features in google-protobuf. To install/update your protobuf to the latest version, you can install from source at protobuf-release with following commands.

See installation from source instructions in C++ Installation

yum erase protobuf           # Remove old version
cd protobuf-3.6.1/
./configure CXXFLAGS=-fPIC   # Compile with -fPIC
make -j                      # Compile
make install                 # Install system-wide

We support different version of python. You can choose your python version in Makefile

PYINC=$(PY27INC)

Then execute

make -j

The generated memoire.so can be directly imported in python by

import memoire

Other Dependency

ZeroMQ, google-test, libbfd (for debug).

yum install zeromq-devel binutils-devel gtest-devel

Documentation

See API for reference.

lns / memoire