tf-gqn

This repository contains a Tensorflow implementation of the Generative Query Network (GQN) described in 'Neural Scene Representation and Rendering' by Eslami et al. (2018).

Original paper

Neural Scene Representation and Rendering [PDF] [blog]

S. M. Ali Eslami, Danilo J. Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis

If you use this repository, please cite the original publication:

@article{eslami2018neural,
  title={Neural scene representation and rendering},
  author={Eslami, SM Ali and Rezende, Danilo Jimenez and Besse, Frederic and Viola, Fabio and Morcos, Ari S and Garnelo, Marta and Ruderman, Avraham and Rusu, Andrei A and Danihelka, Ivo and Gregor, Karol and others},
  journal={Science},
  volume={360},
  number={6394},
  pages={1204--1210},
  year={2018},
  publisher={American Association for the Advancement of Science}
}

Software Requirements

The code requires at least Tensorflow 1.8.0. It has been tested on the following platforms:

Ubuntu 16.04 kernel 4.15.0-24-generic with Python 3.5.2;
macOS Sierra 10.12.5 with Python 3.6.5
macOS High Sierra 10.13.6 with Python 3.5.2.

The major software requirements can be installed on an Ubuntu machine via:

$ sudo apt-get install python3-pip python3-dev virtualenv

Also, in order to run the models efficiently on GPU, the latest NVIDIA drivers, CUDA and cuDNN frameworks which are compatible with Tensorflow should be installed (see version list).

Installation

All Python dependencies should live in their own virtual environment. All runtime requirements can be easily installed via the following commands:

$ virtualenv -p python3 venv
$ source venv/bin/activate
(venv) $ pip3 install -r requirements.txt

Additional requirements for development purposes can be found in dev_requirements.txt and can be added on demand.

(venv) $ pip3 install -r dev_requirements.txt

Training

Training Data

The data provider implementation is adapted from: https://github.com/deepmind/gqn-datasets

The training datasets can be downloaded from: https://console.cloud.google.com/storage/gqn-dataset

To download the datasets you can use the gsutil cp command; see also the gsutil installation instructions.

Training Script

The training script can be started with the following command, assuming the GQN datasets have been downloaded in /tmp/data/gqn-dataset:

python3 train_gqn_draw.py \
  --data_dir /tmp/data/gqn-dataset \
  --dataset rooms_ring_camera \
  --model_dir /tmp/models/gqn

For more verbose information (and summaries), you can pass the --debug option to the script as well.

Model Snapshots

We provide the a few model snapshots we obtained during our debugging runs as a reference. We trained the GQN on a subset of rooms_ring_camera for about 220K steps with the Pool architecture for scene encoding and the DRAW architecture for rendering with a variable sequence length (2 through 12 generation steps). The snapshots can be downloaded as tar.gz archive.

Notes

A few random notes about this implementation:

The model has so far only been trained successfully on the rooms_ring_camera dataset of the GQN data repository.
We were not able to train the model with the learning rate scheme reported in the original paper (from 5*10e-4 to 5*10e-5 over 200K steps). This always resulted in a local minimum only generating light blue sky and a grey blob of background. We achieved good results by lowering all learning rates by one order of magnitude.
Currently, our implementation does not share the convolutional cores between the inference and generation LSTMs. With shared cores we observed the KL divergence between posterior and prior collapsing to zero frequently and obtained generally inferior results (which is in line with the results reported in the paper).
In our tests, we found eight generation steps to be a good trade-off between training stability, training speed and visual quality.
We have trained models on Titan Xp and GTX 1080Ti GPUs usually obtaining visually reasonable results after about one day of training.

Authors

Oliver Groth [github]
Ștefan Săftescu [github]

Done during our PhD research at the Oxford Robotics Institute, and the Visual Geometry Group.

Gogumee / tf-gqn