cosmoharrigan/rc-nfq

RC-NFQ: Regularized Convolutional Neural Fitted Q Iteration

A batch algorithm for deep reinforcement learning. Incorporates dropout regularization and convolutional neural networks with a separate target Q network.

Follow @cosmoharrigan on Twitter

This algorithm extends the following techniques:

Riedmiller, Martin. "Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method." Machine Learning: ECML 2005. Springer Berlin Heidelberg, 2005. 317-328.
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.
Lin, Long-Ji. "Self-improving reactive agents based on reinforcement learning, planning and teaching." Machine learning 8.3-4 (1992): 293-321.

Project Status: This project is still a work in progress and is not finished.

Overview

Creating an instance of the RC-NFQ algorithm

The NFQ class creates an instance of the RC-NFQ algorithm for a particular agent and environment.

Parameters

state_dim - The state dimensionality. An integer if convolutional = False, a 2D tuple otherwise.
nb_actions - The number of possible actions
terminal_states - The integer indices of the terminal states
convolutional - Boolean. When True, uses convolutional neural networks and dropout regularization. Otherwise, uses a simple MLP.
mlp_layers - A list consisting of an integer number of neurons for each hidden layer. Default = [20, 20]. For convolutional = False.
discount_factor - The discount factor for Q-learning.
separate_target_network - boolean - If True, then it will use a separate Q-network for computing the targets for the Q-learning updates, and the target network will be updated with the parameters of the main Q-network every target_network_update_freq iterations.
target_network_update_freq - The frequency at which to update the target network.
lr - The learning rate for the RMSprop gradient descent algorithm.
max_iters - The maximum number of iterations that will be performed. Used to allocate memory for NumPy arrays. Default = 20000.
max_q_predicted - The maximum number of Q-values that will be predicted. Used to allocate memory for NumPy arrays. Default = 100000.

Fitting the Q network

The NFQ class has a fit_vectorized method, which is used to run an iteration of the RC-NFQ algorithm and update the Q function. The implementation is vectorized for improved performance.

The function requires a set of interactions with the environment. They consist of experience tuples of the form (s, a, r, s_prime), stored in 4 parallel arrays.

Parameters

D_s - A list of states s for each experience tuple
D_a - A list of actions a for each experience tuple
D_r - A list of rewards r for each experience tuple
D_s_prime - A list of states s_prime for each experience tuple
num_iters - The number of epochs to run per batch. Default = 1.
shuffle - Whether to shuffle the data before training. Default = False.
nb_samples - If specified, uses nb_samples samples from the experience tuples selected without replacement. Otherwise, all eligible samples are used.
sliding_window - If specified, only the last nb_samples samples will be eligible for use. Otherwise, all samples are eligible.
full_batch_sgd - Boolean. Determines whether RMSprop will use full-batch or mini-batch updating. Default = False.
validation - Boolean. If True, a validation set will be used consisting of the last 10% of the experience tuples, and the validation loss will be monitored. Default = True.

Setting up an experiment

An experiment consists of an Experiment definition and an Environment definition. These need to be configured in the api_vision.py webserver.

The webserver exposes a REST resource used for communicating with the robot. An implementation of a client for a customized LEGO Mindstorms EV3 robot is provided in client_vision.py.

Streaming video is sent by the robot. An implementation for a customized LEGO Mindstorms EV3 robot is provided in rapid_streaming_zmq.py. The streaming video is then received by the server using receive_video_zmq.py. The video stream can be monitored using show_video_zmq.py.

Citation

@misc{rcnfq,
  author = {Harrigan, Cosmo},
  title = {RC-NFQ: Regularized Convolutional Neural Fitted Q Iteration},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cosmoharrigan/rc-nfq}}
}

cosmoharrigan / rc-nfq

RC-NFQ: Regularized Convolutional Neural Fitted Q Iteration

A batch algorithm for deep reinforcement learning. Incorporates dropout regularization and convolutional neural networks with a separate target Q network.

Overview

Creating an instance of the RC-NFQ algorithm

Parameters

Fitting the Q network

Parameters

Setting up an experiment

Citation

About

Languages