Reptile-Torch

Reptile Torch is a PyTorch Implementation of the Meta-Learning Algorithm proposed in the paper 'Reptile: A Scalable Meta-Learning Algorithm' by Alex Nichol and John Schulman from OpenAI.

This code has been produced as part of the paper 'Meta-Learning Conjugate Priors for Few-Shot Bayesian Optimization', concerning the topic of Automated Machine Learning with applications in Statistics. For any questions or remarks feel free to contact me in the details provided in the paper.

_{^{This project is licensed under the MIT License. Any reproduction of this work should reference the original author(s).}}

Methodology

We provide a parallelized version of the batched Reptile algorithm by Nichol & Schulman using the PyTorch library. This method drastically improves the computational performance over the original implementation by relegating the gradient learning tasks to the GPU. The specifics of our approach are listed in the topics below.

Meta-Learning

The basic principle that is utilized by Reptile is that of Meta-Learning. The conceptual framework around this method is that we can derive a better estimate for the initialization parameters θ of a neural network by training those initializations on a generalized task structure.

Pre-training the initialization parameters of a neural network through Meta-Learning allows for convergence using significantly less data points. The paper 'Provable Guarantees for Gradient-Based Meta-Learning' further argues that generalization of the Meta-Learned parameters improve with task-similarity.

Evidently, the quality of the Meta-Learned parametrization depends on both the distribution and scope of the tasks within the learned task class. Meta-Learning methods are known to significantly outperform transfer-learning based methods if only few data are available. Combining both of these techniques as in 'Meta-Transfer Learning for Few-Shot Learning' has shown promising results.

Reptile

The Reptile algorithm is a variant of the MAML algorithm from the paper 'Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks' by Finn, Abbeel & Levine. Both MAML and Reptile are compatible with any model trained by gradient descent, such as Bayes by Backpropagation. The advantage of Reptile over MAML is that unlike MAML it doesn't require computation of expensive higher-order differentials and instead optimizes directly on the gradient.

Parallel Batched Reptile

Randomly Initialize Weight Parameters $\theta$

for Iteration = 1, 2, ... do

Sample Task Batch τ₁, τ₂, ... , τ_n

for Task = τ₁, τ₂, ... , τ_n do

end

end

Note here we batch for single gradient steps, taking more gradient steps before updating the initialization parameters would require computation of the n^th order derivative.

Torch

Using the above batch version of Reptile, we can implement parallelization of the loop from the 4^th to the 6^th step of the algorithm. We do this by using the nn.Module of PyTorch. Every training iteration we sample our batch of tasks, then using PyTorch we calculate the loss tensor using nn.MSELoss and perform backpropagation with nn.backward to update the meta initialization parameters.

Usage Instructions

In this repository we have provided an easy to use iPython notebook. The easiest way to use this is to upload the notebook to your own Google Colab repository. It can then be ran without any additional configuration required. If you would like to run the implementation on your local machine, you might need to pass some additional requirements.

Requirements

This project requires the following specs and libraries to function:

NVIDIA CUDA Capable GPU
Python 3.6 or Above
- Numpy
- Pandas
TensorFlow 2.x or Above
PyTorch 1.7.x or Above

Defining Tasks

Defining tasks is rather straightforward, in the notebook we have provided there is an example of a logistic distribution task which models the logistic distribution function given a sample of points x₁, x₂, ... x_n.

Which translates into

def  logistic(x, theta):
    return theta[0]/(1 + np.exp(-1 * theta[1] * (x - theta[2])))

Here θ₀ = L, θ₁ = k and θ₂ = x₀, which are the parameters that can vary across the tasks of this class. To Meta-Learn from this task, we randomly sample tasks from this class from uniformly sampled task parameters θ_i.

Running Experiments

In the notebook a straightforward experiment is provided, which return the cross-validated MSE in addition to a visualization of the training loss and individual batch performance.

The following parameters are available and their defaults:

Parameter	Default	Parameter	Default
Inner Step Size	0.02	Evaluation Iterations	32
Inner Batch Size	5	Evaluation Batch Size	10
Outer Step Size	0.1	Evaluation Range	(1,11)
Outer Iterations	10,000	Sample Radius	4
Meta Batch Size	10	Sample Count	100
Model Size	32	Task	Logistic

dualslash / reptile-torch