NIPS2016

This project collects the different accepted papers for NIPS 2016 and their link to Arxiv or Gitxiv

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

Bryan He*, Stanford University; Christopher De Sa, Stanford University; Ioannis Mitliagkas, ; Christopher Ré, Stanford University

https://arxiv.org/abs/1606.03432

Abstract:

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional >distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality >in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been >conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by >counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor >under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan >using conductance.

Deep ADMM-Net for Compressive Sensing MRI

Yan Yang, Xi'an Jiaotong University; Jian Sun*, Xi'an Jiaotong University; Huibin Li, ; Zongben Xu,

A scaled Bregman theorem with applications

Richard NOCK, Data61 and ANU; Aditya Menon*, ; Cheng Soon Ong, Data61

http://arxiv.org/abs/1607.00360

Abstract :

Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores >the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled >isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman >distortions'" (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over >transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, >while admissible data include scalars, vectors and matrices. Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the >aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class >density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, >and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate >the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence >properties that have been pervasive in machine learning.

Swapout: Learning an ensemble of deep architectures

Saurabh Singh*, UIUC; Derek Hoiem, UIUC; David Forsyth, UIUC

http://arxiv.org/abs/1605.06465

Abstract:

We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive >results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual >architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, >similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying >the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than >existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting >architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient >training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer >wider model performs similar to a 1001 layer ResNet model.

On Regularizing Rademacher Observation Losses

Richard NOCK*, Data61 and ANU

http://users.cecs.anu.edu.au/~rnock/nips2016-n-web.pdf

Abstract:

It has recently been shown that supervised learning linear classifiers with two of the most popular losses, the logistic and square loss, is equivalent to optimizing an equivalent loss over sufficient statistics about the class: Rademacher observations (rados). It has also been shown that learning over rados brings solutions to two prominent problems for which the state of the art of learning from examples can be comparatively inferior and in fact less convenient: (i) protecting and learning from private examples, (ii) learning from distributed datasets without entity resolution. Bis repetita placent: the two proofs of equivalence are different and rely on specific properties of the corresponding losses, so whether these can be unified and generalized inevitably comes to mind. This is our first contribution: we show how they can be fit into the same theory for the equivalence between example and rado losses. As a second contribution, we show that the generalization unveils a surprising new connection to regularized learning, and in particular a sufficient condition under which regularizing the loss over examples is equivalent to regularizing the rados (i.e. the data) in the equivalent rado loss, in such a way that an efficient algorithm for one regularized rado loss may be as efficient when changing the regularizer. This is our third contribution: we give a formal boosting algorithm for the regularized exponential rado-loss which boost with any of the ridge, lasso, SLOPE, `1, or elastic net regularizer, using the same master routine for all. Because the regularized exponential rado-loss is the equivalent of the regularized logistic loss over examples we obtain the first efficient proxy to the minimization of the regularized logistic loss over examples using such a wide spectrum of regularizers. Experiments display that regularization significantly improves rado-based learning and compares favourably with example-based learning.

Without-Replacement Sampling for Stochastic Gradient Methods

Ohad Shamir*, Weizmann Institute of Science

https://arxiv.org/abs/1603.00570

Abstract:

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled >with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often >performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various >scenarios, for three types of algorithms: Any algorithm with online regret guarantees, stochastic gradient descent, and SVRG. A >useful application of our SVRG analysis is a nearly-optimal algorithm for regularized least squares in a distributed setting, in >terms of both communication complexity and runtime complexity, when the data is randomly partitioned and the condition number can be >as large as the data size (up to logarithmic factors). Our proof techniques combine ideas from stochastic optimization, adversarial >online learning, and transductive learning theory, and can potentially be applied to other stochastic optimization and learning >problems.

Fast and Provably Good Seedings for k-Means

Olivier Bachem*, ETH Zurich; Mario Lucic, ETH Zurich; Hamed Hassani, ETH Zurich; Andreas Krause,

Abstract:

Seeding - the task of finding initial cluster centers - is critical in obtaining high-quality clusterings for k-Means. However, >k-means++ seeding, the state of the art algorithm, does not scale well to massive datasets as it is inherently sequential and >requires k full passes through the data. It was recently shown that Markov chain Monte Carlo sampling can be used to efficiently >approximate the seeding step of k-means++. However, this result requires assumptions on the data generating distribution. We propose >a simple yet fast seeding algorithm that produces provably good clusterings even without assumptions on the data. Our analysis >shows that the algorithm allows for a favourable trade-off between solution quality and computational cost, speeding up k-means++ >seeding by up to several orders of magnitude. We validate our theoretical results in extensive experiments on a variety of >real-world data sets.

Unsupervised Learning for Physical Interaction through Video Prediction

Chelsea Finn*, Google, Inc.; Ian Goodfellow, ; Sergey Levine, University of Washington

http://arxiv.org/abs/1605.07157

Abstract:

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. >Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale >real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To >learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models >pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it >is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for >real-world interactive agents, we also introduce a dataset of 50,000 robot interactions involving pushing motions, including a test >set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning >a "visual imagination" of different futures based on different courses of action. Our experiments show that our proposed method not >only produces more accurate video predictions, but also more accurately predicts object motion, when compared to prior methods.

Matrix Completion and Clustering in Self-Expressive Models

Ehsan Elhamifar*,

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Chengkai Zhang, ; Jiajun Wu*, MIT; Tianfan Xue, ; William Freeman, ; Joshua Tenenbaum,

Probabilistic Modeling of Future Frames from a Single Image

Tianfan Xue*, ; Jiajun Wu, MIT; Katherine Bouman, MIT; William Freeman,

Human Decision-Making under Limited Time

Pedro Ortega*, ; Alan Stocker,

Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition

Shizhong Han*, University of South Carolina; Zibo Meng, University of South Carolina; Ahmed Shehab Khan, University of South Carolina; Yan Tong, University of South Carolina

https://cse.sc.edu/~mengz/papers/NIPS2016.pdf

Abstract:

Recognizing facial action units (AUs) from spontaneous facial expressions is still a challenging problem. Most recently, CNNs have shown promise on facial AU recognition. However, the learned CNNs are often overfitted and do not gener- alize well to unseen subject due to limited AU-coded training images. We pro- posed a novel Incremental Boosting CNN (IB-CNN) to integrate boosting into the CNN via an incremental boosting layer that selects discriminative neurons from the lower layer and is incrementally updated on successive mini-batches. In ad- dition, a novel loss function that accounts for errors from both the incremental boosted classifier and individual weak classifiers was proposed to fine-tune the IB- CNN. Experimental results on two benchmark AU databases have demonstrated that the IB-CNN yields significant improvement over the traditional CNN and the one without incremental learning, as well as outperforming the state-of-the-art CNN-based methods in AU recognition. The improvement is more impressive for the AUs that have the lowest frequencies in the databases

Natural-Parameter Networks: A Class of Probabilistic Neural Networks

Hao Wang*, HKUST; Xingjian Shi, ; Dit-Yan Yeung,

Tree-Structured Reinforcement Learning for Sequential Object Localization

Zequn Jie*, National Univ of Singapore; Xiaodan Liang, Sun Yat-sen University; Jiashi Feng, National University of Singapo; Xiaojie Jin, NUS; Wen Feng Lu, National Univ of Singapore; Shuicheng Yan,

Unsupervised Domain Adaptation with Residual Transfer Networks

Mingsheng Long*, Tsinghua University; Han Zhu, Tsinghua University; Jianmin Wang, Tsinghua University; Michael Jordan,

http://arxiv.org/abs/1602.04433

Abstract:

The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is >unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to >domain adaptation in deep networks that can simultaneously learn adaptive classifiers and transferable features from labeled data in >the source domain and unlabeled data in the target domain. We relax a shared-classifier assumption made by previous methods and >assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging >several layers into the deep network to explicitly learn the residual function with reference to the target classifier. We embed >features of multiple layers into reproducing kernel Hilbert spaces (RKHSs) and match feature distributions for feature adaptation. >The adaptation behaviors can be achieved in most feed-forward models by extending them with new residual layers and loss functions, >which can be trained efficiently using standard back-propagation. Empirical evidence exhibits that the approach outperforms state of >art methods on standard domain adaptation datasets.

Verification Based Solution for Structured MAB Problems

Zohar Karnin*,

Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games

Maximilian Balandat*, UC Berkeley; Walid Krichene, UC Berkeley; Claire Tomlin, UC Berkeley; Alexandre Bayen, UC Berkeley

Linear dynamical neural population models through nonlinear embeddings

Yuanjun Gao, Columbia University; Evan Archer*, ; John Cunningham, ; Liam Paninski,

https://arxiv.org/abs/1605.08454

Abstract:

A body of recent work in modeling neural activity focuses on recovering low-dimensional latent features that capture the statistical >structure of large-scale neural populations. Most such approaches have focused on linear generative models, where inference is >computationally tractable. Here, we propose fLDS, a general class of nonlinear generative models that permits the firing rate of >each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state. This extra flexibility allows the model to >capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent >space. To fit this class of non-conjugate models we propose a variational inference scheme, along with a novel approximate posterior >capable of capturing rich temporal correlations across time. We show that our techniques permit inference in a wide class of >generative models.We also show in application to two neural datasets that, compared to state-of-the-art neural population models, >fLDS captures a much larger proportion of neural variability with a small number of latent dimensions, providing superior predictive >performance and interpretability.

SURGE: Surface Regularized Geometry Estimation from a Single Image

Peng Wang*, UCLA; Xiaohui Shen, Adobe Research; Bryan Russell, ; Scott Cohen, Adobe Research; Brian Price, ; Alan Yuille,

Interpretable Distribution Features with Maximum Testing Power

Wittawat Jitkrittum*, Gatsby Unit, UCL; Zoltan Szabo, ; Kacper Chwialkowski, Gatsby Unit, UCL; Arthur Gretton,

https://arxiv.org/abs/1605.06796

Abstract:

Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions >evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the >distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious >and interpretable indication of how and where two distributions differ locally. An empirical estimate of the test power criterion >converges with increasing sample size, ensuring the quality of the returned features. In real-world benchmarks on high-dimensional >text and image data, linear-time tests using the proposed semimetrics achieve comparable performance to the state-of-the-art >quadratic-time maximum mean discrepancy test, while returning human-interpretable features that explain the test results.

Sorting out typicality with the inverse moment matrix SOS polynomial

Edouard Pauwels*, ; Jean-Bernard Lasserre, LAAS-CNRS

http://arxiv.org/abs/1606.03858

Abstract:

We study a surprising phenomenon related to the representation of a cloud of data points using polynomials. We start with the >previously unnoticed empirical observation that, given a collection (a cloud) of data points, the sublevel sets of a certain >distinguished polynomial capture the shape of the cloud very accurately. This distinguished polynomial is a sum-of-squares (SOS) >derived in a simple manner from the inverse of the empirical moment matrix. In fact, this SOS polynomial is directly related to >orthogonal polynomials and the Christoffel function. This allows to generalize and interpret extremality properties of orthogonal >polynomials and to provide a mathematical rationale for the observed phenomenon. Among diverse potential applications, we illustrate >the relevance of our results on a network intrusion detection task for which we obtain performances similar to existing dedicated >methods reported in the literature.

Multi-armed Bandits: Competing with Optimal Sequences

Zohar Karnin*, ; Oren Anava, Technion

Multivariate tests of association based on univariate tests

Ruth Heller*, Tel-Aviv University; Yair Heller,

http://arxiv.org/abs/1603.03418

Abstract:

For testing two random vectors for independence, we consider testing whether the distance of one vector from a center point is >independent from the distance of the other vector from a center point by a univariate test. In this paper we provide conditions >under which it is enough to have a consistent univariate test of independence on the distances to guarantee that the power to detect >dependence between the random vectors increases to one, as the sample size increases. These conditions turn out to be minimal. If >the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center >points and aggregate the center-specific univariate tests, the power may be further improved, and the resulting multivariate test >may be distribution-free for specific aggregation methods (if the univariate test is distribution-free). We show that several >multivariate tests recently proposed in the literature can be viewed as instances of this general approach.

Learning What and Where to Draw

Scott Reed*, University of Michigan; Zeynep Akata, Max Planck Institute for Informatics; Santosh Mohan, University of MIchigan; Samuel Tenka, University of MIchigan; Bernt Schiele, ; Honglak Lee, University of Michigan

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

Damek Davis*, Cornell University; Brent Edmunds, University of California, Los Angeles; Madeleine Udell,

https://arxiv.org/abs/1606.02338

Abstract:

We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic >proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel >optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best >known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the >number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and >show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix >factorization problems.

Integrator Nets

Hakan Bilen*, University of Oxford; Andrea Vedaldi,

Combining Low-Density Separators with CNNs

Yu-Xiong Wang*, Carnegie Mellon University; Martial Hebert, Carnegie Mellon University

CNNpack: Packing Convolutional Neural Networks in the Frequency Domain

Yunhe Wang*, Peking University ; Shan You, ; Dacheng Tao, ; Chao Xu, ; Chang Xu,

Cooperative Graphical Models

Josip Djolonga*, ETH Zurich; Stefanie Jegelka, MIT; Sebastian Tschiatschek, ETH Zurich; Andreas Krause,

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Sebastian Nowozin*, Microsoft Research; Botond Cseke, Microsoft Research; Ryota Tomioka, MSRC

https://arxiv.org/abs/1606.00709

Abstract:

Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random >input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and >allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The >generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We >show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation >approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various >choices of divergence functions on training complexity and the quality of the obtained generative models.

Bayesian Optimization for Probabilistic Programs

Tom Rainforth*, University of Oxford; Tuan Anh Le, University of Oxford; Jan-Willem van de Meent, University of Oxford; Michael Osborne, ; Frank Wood,

Hierarchical Question-Image Co-Attention for Visual Question Answering

Jiasen Lu*, Virginia Tech; Jianwei Yang, Virginia Tech; Dhruv Batra, ; Devi Parikh, Virginia Tech

https://arxiv.org/abs/1606.00061

Abstract:

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting >image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual >attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model >for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question and consequently >the image via the co-attention mechanism in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN) >model. Our final model outperforms all reported methods, improving the state-of-the-art on the VQA dataset from 60.4% to 62.1%, and >from 61.6% to 65.4% on the COCO-QA dataset.

Optimal Sparse Linear Encoders and Sparse PCA

Malik Magdon-Ismail*, Rensselaer; Christos Boutsidis,

FPNN: Field Probing Neural Networks for 3D Data

Yangyan Li*, Stanford University; Soeren Pirk, Stanford University; Hao Su, Stanford University; Charles Qi, Stanford University; Leonidas Guibas, Stanford University

https://arxiv.org/abs/1605.06240

Abstract:

Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. >Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting >convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of >3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, >occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, >we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract >features from them. Each field probing filter is a set of probing points --- sensors that perceive the space. Our learning algorithm >optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing >filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space "intelligently", rather than >operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing >state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets.

CRF-CNN: Modeling Structured Information in Human Pose Estimation

Xiao Chu*, Cuhk; Wanli Ouyang, ; hongsheng Li, cuhk; Xiaogang Wang, Chinese University of Hong Kong

Fairness in Learning: Classic and Contextual Bandits

Matthew Joseph, University of Pennsylvania; Michael Kearns, ; Jamie Morgenstern*, University of Pennsylvania; Aaron Roth,

https://arxiv.org/abs/1605.07139

Abstract:

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that >given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite >a learning algorithm's uncertainty over the true payoffs. We prove results of two types. First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a >provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on >the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for >standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the >general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a >KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair >contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably >fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different >class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms

Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization Alexander Kirillov*, TU Dresden; Alexander Shekhovtsov, ; Carsten Rother, ; Bogdan Savchynskyy,

Domain Separation Networks Dilip Krishnan, Google; George Trigeorgis, Google; Konstantinos Bousmalis*, ; Nathan Silberman, Google; Dumitru Erhan, Google

DISCO Nets : DISsimilarity COefficients Networks Diane Bouchacourt*, University of Oxford; M. Pawan Kumar, University of Oxford; Sebastian Nowozin,

Multimodal Residual Learning for Visual QA Jin-Hwa Kim*, Seoul National University; Sang-Woo Lee, Seoul National University; Dong-Hyun Kwak, Seoul National University; Min-Oh Heo, Seoul National University; Jeonghee Kim, Naver Labs; Jung-Woo Ha, Naver Labs; Byoung-Tak Zhang, Seoul National University

CMA-ES with Optimal Covariance Update and Storage Complexity Dídac Rodríguez Arbonès, University of Copenhagen; Oswin Krause, ; Christian Igel*,

R-FCN: Object Detection via Region-based Fully Convolutional Networks Jifeng Dai, Microsoft; Yi Li, Tsinghua University; Kaiming He*, Microsoft; Jian Sun, Microsoft

GAP Safe Screening Rules for Sparse-Group Lasso Eugene Ndiaye, Télécom ParisTech; Olivier Fercoq, ; Alexandre Gramfort, ; Joseph Salmon*,

Learning and Forecasting Opinion Dynamics in Social Networks Abir De, IIT Kharagpur; Isabel Valera, ; Niloy Ganguly, IIT Kharagpur; sourangshu Bhattacharya, IIT Kharagpur; Manuel Gomez Rodriguez*, MPI-SWS

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Rong Zhu*, Chinese Academy of Sciences

Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks Hao Wang*, HKUST; Xingjian Shi, ; Dit-Yan Yeung,

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula Jean Barbier, EPFL; mohamad Dia, EPFL; Florent Krzakala*, ; Thibault Lesieur, IPHT Saclay; Nicolas Macris, EPFL; Lenka Zdeborova,

A Unified Approach for Learning the Parameters of Sum-Product Networks Han Zhao*, Carnegie Mellon University; Pascal Poupart, ; Geoff Gordon,

Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images Junhua Mao*, UCLA; Jiajing Xu, ; Kevin Jing, ; Alan Yuille,

Stochastic Online AUC Maximization Yiming Ying*, ; Longyin Wen, State University of New York at Albany; Siwei Lyu, State University of New York at Albany

The Generalized Reparameterization Gradient Francisco Ruiz*, Columbia University; Michalis K. Titsias, ; David Blei,

Coupled Generative Adversarial Networks Ming-Yu Liu*, MERL; Oncel Tuzel, Mitsubishi Electric Research Labs (MERL)

Exponential Family Embeddings Maja Rudolph*, Columbia University; Francisco J. R. Ruiz, ; Stephan Mandt, Disney Research; David Blei,

Variational Information Maximization for Feature Selection Shuyang Gao*, ; Greg Ver Steeg, ; Aram Galstyan,

Operator Variational Inference Rajesh Ranganath*, Princeton University; Dustin Tran, Columbia University; Jaan Altosaar, Princeton University; David Blei,

Fast learning rates with heavy-tailed losses Vu Dinh*, Fred Hutchinson Cancer Center; Lam Ho, UCLA; Binh Nguyen, University of Science, Vietnam; Duy Nguyen, University of Wisconsin-Madison

Budgeted stream-based active learning via adaptive submodular maximization Kaito Fujii*, Kyoto University; Hisashi Kashima, Kyoto University

Learning feed-forward one-shot learners Luca Bertinetto, University of Oxford; Joao Henriques, University of Oxford; Jack Valmadre*, University of Oxford; Philip Torr, ; Andrea Vedaldi,

Learning User Perceived Clusters with Feature-Level Supervision Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University

Robust Spectral Detection of Global Structures in the Data by Learning a Regularization Pan Zhang*, ITP, CAS

Residual Networks are Exponential Ensembles of Relatively Shallow Networks Andreas Veit*, Cornell University; Michael Wilber, ; Serge Belongie, Cornell University

Adversarial Multiclass Classification: A Risk Minimization Perspective Rizal Fathony*, U. of Illinois at Chicago; Anqi Liu, ; Kaiser Asif, ; Brian Ziebart,

Solving Random Systems of Quadratic Equations via Truncated Generalized Gradient Flow Gang Wang*, University of Minnesota; Georgios Giannakis, University of Minnesota

Coin Betting and Parameter-Free Online Learning Francesco Orabona*, Yahoo Research; David Pal,

Deep Learning without Poor Local Minima Kenji Kawaguchi*, MIT

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity Eugene Belilovsky*, CentraleSupelec; Gael Varoquaux, ; Matthew Blaschko, KU Leuven

A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++ Dennis Wei*, IBM Research

Generating Videos with Scene Dynamics Carl Vondrick*, MIT; Hamed Pirsiavash, ; Antonio Torralba,

Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs Daniel Ritchie*, Stanford University; Anna Thomas, Stanford University; Pat Hanrahan, Stanford University; Noah Goodman,

A Powerful Generative Model Using Random Weights for the Deep Image Representation Kun He, Huazhong University of Science and Technology; Yan Wang*, HUAZHONG UNIVERSITY OF SCIENCE; John Hopcroft, Cornell University

Optimizing affinity-based binary hashing using auxiliary coordinates Ramin Raziperchikolaei, UC Merced; Miguel Carreira-Perpinan*, UC Merced

Double Thompson Sampling for Dueling Bandits Huasen Wu*, University of California at Davis; Xin Liu, University of California, Davis

Generating Images with Perceptual Similarity Metrics based on Deep Networks Alexey Dosovitskiy*, ; Thomas Brox, University of Freiburg

Dynamic Filter Networks Xu Jia*, KU Leuven; Bert De Brabandere, ; Tinne Tuytelaars, KU Leuven; Luc Van Gool, ETH Zürich

A Simple Practical Accelerated Method for Finite Sums Aaron Defazio*, Ambiata

Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan*, The Chinese University of HK; Shiqian Ma, ; Yu-Hong Dai, ; Yuqiu Qian, The University of Hong Kong

On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability Guillaume Papa, Télécom ParisTech; Aurélien Bellet*, ; Stephan Clémencon,

Optimal spectral transportation with application to music transcription Rémi Flamary, ; Cédric Févotte*, CNRS; Nicolas Courty, ; Valentin Emiya, Aix-Marseille University

Regularized Nonlinear Acceleration Damien Scieur*, INRIA - ENS; Alexandre D'Aspremont, ; Francis Bach,

SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling Dehua Cheng*, Univ. of Southern California; Richard Peng, ; Yan Liu, ; Ioakeim Perros, Georgia Institute of Technology

Single-Image Depth Perception in the Wild Weifeng Chen*, University of Michigan; Zhao Fu, University of Michigan; Dawei Yang, University of Michigan; Jia Deng,

Computational and Statistical Tradeoffs in Learning to Rank Ashish Khetan*, University of Illinois Urbana-; Sewoong Oh,

Learning to Poke by Poking: Experiential Learning of Intuitive Physics Pulkit Agrawal*, UC Berkeley; Ashvin Nair, UC Berkeley; Pieter Abbeel, ; Jitendra Malik, ; Sergey Levine, University of Washington

Online Convex Optimization with Unconstrained Domains and Losses Ashok Cutkosky*, Stanford University; Kwabena Boahen, Stanford University

An ensemble diversity approach to supervised binary hashing Miguel Carreira-Perpinan*, UC Merced; Ramin Raziperchikolaei, UC Merced

Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis Weiran Wang*, ; Jialei Wang, University of Chicago; Dan Garber, ; Nathan Srebro,

The Power of Adaptivity in Identifying Statistical Alternatives Kevin Jamieson*, UC Berkeley; Daniel Haas, ; Ben Recht,

On Explore-Then-Commit strategies Aurelien Garivier, ; Tor Lattimore, ; Emilie Kaufmann*,

Sublinear Time Orthogonal Tensor Decomposition Zhao Song*, UT-Austin; David Woodruff, ; Huan Zhang, UC-Davis

DECOrrelated feature space partitioning for distributed sparse regression Xiangyu Wang*, Duke University; David Dunson, Duke University; Chenlei Leng, University of Warwick

Deep Alternative Neural Networks: Exploring Contexts as Early as Possible for Action Recognition Jinzhuo Wang*, PKU; Wenmin Wang, peking university; xiongtao Chen, peking university; Ronggang Wang, peking university; Wen Gao, peking university

Machine Translation Through Learning From a Communication Game Di He*, Microsoft; Yingce Xia, USTC; Tao Qin, Microsoft; Liwei Wang, ; Nenghai Yu, USTC; Tie-Yan Liu, Microsoft; wei-Ying Ma, Microsoft

Dialog-based Language Learning Jason Weston*,

Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition Theodore Bluche*, A2iA

Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction Hsiang-Fu Yu*, University of Texas at Austin; Nikhil Rao, ; Inderjit Dhillon,

Active Nearest-Neighbor Learning in Metric Spaces Aryeh Kontorovich, ; Sivan Sabato*, Ben-Gurion University of the Negev; Ruth Urner, MPI Tuebingen

Proximal Deep Structured Models Shenlong Wang*, University of Toronto; Sanja Fidler, ; Raquel Urtasun,

Faster Projection-free Convex Optimization over the Spectrahedron Dan Garber*,

Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach Remi Lam*, MIT; Karen Willcox, MIT; David Wolpert,

Learning Sound Representations from Unlabeled Video Yusuf Aytar, MIT; Carl Vondrick*, MIT; Antonio Torralba,

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks Tim Salimans*, ; Diederik Kingma,

Efficient Second Order Online Learning by Sketching Haipeng Luo*, Princeton University; Alekh Agarwal, Microsoft; Nicolò Cesa-Bianchi, ; John Langford,

Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis Yoshinobu Kawahara*, Osaka University

Distributed Flexible Nonlinear Tensor Factorization Shandian Zhe*, Purdue University; Kai Zhang, Lawrence Berkeley Lab; Pengyuan Wang, Yahoo! Research; Kuang-chih Lee, ; Zenglin Xu, ; Alan Qi, ; Zoubin Ghahramani,

The Robustness of Estimator Composition Pingfan Tang*, University of Utah; Jeff Phillips, University of Utah

Efficient and Robust Spiking Neural Circuit for Navigation Inspired by Echolocating Bats Bipin Rajendran*, NJIT; Pulkit Tandon, IIT Bombay; Yash Malviya, IIT Bombay

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions Michael Figurnov*, Skolkovo Inst. of Sc and Tech; Aijan Ibraimova, Skolkovo Institute of Science and Technology; Dmitry P. Vetrov, ; Pushmeet Kohli,

Differential Privacy without Sensitivity Kentaro Minami*, The University of Tokyo; HItomi Arai, The University of Tokyo; Issei Sato, The University of Tokyo; Hiroshi Nakagawa,

Optimal Cluster Recovery in the Labeled Stochastic Block Model Se-Young Yun*, Los Alamos National Laboratory; Alexandre Proutiere,

Even Faster SVD Decomposition Yet Without Agonizing Pain Zeyuan Allen-Zhu*, Princeton University; Yuanzhi Li, Princeton University

An algorithm for L1 nearest neighbor search via monotonic embedding Xinan Wang*, UCSD; Sanjoy Dasgupta,

Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations Kirthevasan Kandasamy*, CMU; Gautam Dasarathy, Carnegie Mellon University; Junier Oliva, ; Jeff Schneider, CMU; Barnabas Poczos,

Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes Dan Garber*, ; Ofer Meshi,

Efficient Nonparametric Smoothness Estimation Shashank Singh*, Carnegie Mellon University; Simon Du, Carnegie Mellon University; Barnabas Poczos,

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks Yarin Gal*, University of Cambridge; Zoubin Ghahramani,

Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation George Papamakarios*, University of Edinburgh; Iain Murray, University of Edinburgh

Direct Feedback Alignment Provides Learning In Deep Neural Networks Arild Nøkland*, None

Safe and Efficient Off-Policy Reinforcement Learning Remi Munos, Google DeepMind; Thomas Stepleton, Google DeepMind; Anna Harutyunyan, Vrije Universiteit Brussel; Marc Bellemare*, Google DeepMind

A Multi-Batch L-BFGS Method for Machine Learning Albert Berahas*, Northwestern University; Jorge Nocedal, Northwestern University; Martin Takac, Lehigh University

Semiparametric Differential Graph Models Pan Xu*, University of Virginia; Quanquan Gu, University of Virginia

Rényi Divergence Variational Inference Yingzhen Li*, University of Cambridge; Richard E. Turner,

Doubly Convolutional Neural Networks Shuangfei Zhai*, Binghamton University; Yu Cheng, IBM Research; Zhongfei Zhang, Binghamton University

Density Estimation via Discrepancy Based Adaptive Sequential Partition Dangna Li*, Stanford university; Kun Yang, Google Inc; Wing Wong, Stanford university

How Deep is the Feature Analysis underlying Rapid Visual Categorization? Sven Eberhardt*, Brown University; Jonah Cader, Brown University; Thomas Serre,

Variational Information Maximizing Exploration Rein Houthooft*, Ghent University - iMinds; UC Berkeley; OpenAI; Xi Chen, UC Berkeley; OpenAI; Yan Duan, UC Berkeley; John Schulman, OpenAI; Filip De Turck, Ghent University - iMinds; Pieter Abbeel,

Generalized Correspondence-LDA Models (GC-LDA) for Identifying Functional Regions in the Brain Timothy Rubin*, Indiana University; Sanmi Koyejo, UIUC; Michael Jones, Indiana University; Tal Yarkoni, University of Texas at Austin

Solving Marginal MAP Problems with NP Oracles and Parity Constraints Yexiang Xue*, Cornell University; Zhiyuan Li, Tsinghua University; Stefano Ermon, ; Carla Gomes, Cornell University; Bart Selman,

Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models Tomoharu Iwata*, ; Makoto Yamada,

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization Sashank Jakkam Reddi*, Carnegie Mellon University; Suvrit Sra, MIT; Barnabas Poczos, ; Alexander J. Smola,

Variance Reduction in Stochastic Gradient Langevin Dynamics Kumar Dubey*, Carnegie Mellon University; Sashank Jakkam Reddi, Carnegie Mellon University; Sinead Williamson, ; Barnabas Poczos, ; Alexander J. Smola, ; Eric Xing, Carnegie Mellon University

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning Mehdi Sajjadi*, University of Utah; Mehran Javanmardi, University of Utah; Tolga Tasdizen, University of Utah

Dense Associative Memory for Pattern Recognition Dmitry Krotov*, Institute for Advanced Study; John Hopfield, Princeton Neuroscience Institute

Causal Bandits: Learning Good Interventions via Causal Inference Finnian Lattimore, Australian National University; Tor Lattimore*, ; Mark Reid,

Refined Lower Bounds for Adversarial Bandits Sébastien Gerchinovitz, ; Tor Lattimore*,

Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning Gang Niu*, University of Tokyo; Marthinus du Plessis, ; Tomoya Sakai, ; Yao Ma, ; Masashi Sugiyama, RIKEN / University of Tokyo

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ϵ) Yi Xu*, The University of Iowa; Yan Yan, University of Technology Sydney; Qihang Lin, ; Tianbao Yang, University of Iowa

Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functionals Estimators Shashank Singh*, Carnegie Mellon University; Barnabas Poczos,

A state-space model of cross-region dynamic connectivity in MEG/EEG Ying Yang*, Carnegie Mellon University; Elissa Aminoff, Carnegie Mellon University; Michael Tarr, Carnegie Mellon University; Robert Kass, Carnegie Mellon University

What Makes Objects Similar: A Unified Multi-Metric Learning Approach Han-Jia Ye, ; De-Chuan Zhan*, ; Xue-Min Si, Nanjing University; Yuan Jiang, Nanjing University; Zhi-Hua Zhou,

Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint Nguyen Viet Cuong*, National University of Singapore; Huan Xu, NUS

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions Siddartha Ramamohan, Indian Institute of Science; Arun Rajkumar, ; Shivani Agarwal*, Radcliffe Institute, Harvard

Local Similarity-Aware Deep Feature Embedding Chen Huang*, Chinese University of HongKong; Chen Change Loy, The Chinese University of HK; Xiaoou Tang, The Chinese University of Hong Kong

A Communication-Efficient Parallel Algorithm for Decision Tree Qi Meng*, Peking University; Guolin Ke, Microsoft Research; Taifeng Wang, Microsoft Research; Wei Chen, Microsoft Research; Qiwei Ye, Microsoft Research; Zhi-Ming Ma, Academy of Mathematics and Systems Science, Chinese Academy of Sciences; Tie-Yan Liu, Microsoft Research

Convex Two-Layer Modeling with Latent Structure Vignesh Ganapathiraman, University Of Illinois at Chicago; Xinhua Zhang*, UIC; Yaoliang Yu, ; Junfeng Wen, UofA

Sampling for Bayesian Program Learning Kevin Ellis*, MIT; Armando Solar-Lezama, MIT; Joshua Tenenbaum,

Learning Kernels with Random Features Aman Sinha*, Stanford University; John Duchi,

Optimal Tagging with Markov Chain Optimization Nir Rosenfeld*, Hebrew University of Jerusalem; Amir Globerson, Tel Aviv University

Crowdsourced Clustering: Querying Edges vs Triangles Ramya Korlakai Vinayak*, Caltech; Hassibi Babak, Caltech

Mixed vine copulas as joint models of spike counts and local field potentials Arno Onken*, IIT; Stefano Panzeri, IIT

Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation Emmanuel Abbe*, ; Colin Sandon,

Adaptive Concentration Inequalities for Sequential Decision Problems Shengjia Zhao*, Tsinghua University; Enze Zhou, Tsinghua University; Ashish Sabharwal, Allen Institute for AI; Stefano Ermon, Fast mini-batch k-means by nesting James Newling*, Idiap Research Institute; Francois Fleuret, Idiap Research Institute

Deep Learning Models of the Retinal Response to Natural Scenes Lane McIntosh*, Stanford University; Niru Maheswaranathan, Stanford University; Aran Nayebi, Stanford University; Surya Ganguli, Stanford; Stephen Baccus, Stanford University

Preference Completion from Partial Rankings Suriya Gunasekar*, UT Austin; Sanmi Koyejo, UIUC; Joydeep Ghosh, UT Austin

Dynamic Network Surgery for Efficient DNNs Yiwen Guo*, Intel Labs China; Anbang Yao, ; Yurong Chen,

Learning a Metric Embedding for Face Recognition using the Multibatch Method Oren Tadmor, OrCam; Tal Rosenwein, Orcam; Shai Shalev-Shwartz, OrCam; Yonatan Wexler*, OrCam; Amnon Shashua, OrCam

A Pseudo-Bayesian Algorithm for Robust PCA Tae-Hyun Oh*, KAIST; David Wipf, ; Yasuyuki Matsushita, Osaka University; In So Kweon, KAIST

End-to-End Kernel Learning with Supervised Convolutional Kernel Networks Julien Mairal*, Inria

Stochastic Variance Reduction Methods for Saddle-Point Problems P. Balamurugan, ; Francis Bach*,

Flexible Models for Microclustering with Applications to Entity Resolution Brenda Betancourt, Duke University; Giacomo Zanella, The University of Warick; Jeffrey Miller, Duke University; Hanna Wallach, Microsoft Research New England; Abbas Zaidi, Duke University; Rebecca C. Steorts*, Duke University

Catching heuristics are optimal control policies Boris Belousov*, TU Darmstadt; Gerhard Neumann, ; Constantin Rothkopf, ; Jan Peters,

Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian Victor Picheny, Institut National de la Recherche Agronomique; Robert Gramacy*, ; Stefan Wild, Argonne National Lab; Sebastien Le Digabel, École Polytechnique de Montréal

Adaptive Neural Compilation Rudy Bunel*, Oxford University; Alban Desmaison, Oxford; M. Pawan Kumar, University of Oxford; Pushmeet Kohli, ; Philip Torr,

Synthesis of MCMC and Belief Propagation Sung-Soo Ahn*, KAIST; Misha Chertkov, Los Alamos National Laboratory; Jinwoo Shin, KAIST

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables Mauro Scanagatta*, Idsia; Giorgio Corani, Idsia; Cassio Polpo de Campos, Queen's University Belfast; Marco Zaffalon, IDSIA

Unifying Count-Based Exploration and Intrinsic Motivation Marc Bellemare*, Google DeepMind; Srinivasan Sriram, ; Georg Ostrovski, Google DeepMind; Tom Schaul, ; David Saxton, Google DeepMind; Remi Munos, Google DeepMind

Large Margin Discriminant Dimensionality Reduction in Prediction Space Mohammad Saberian*, Netflix; Jose Costa Pereira, UC San Diego; Nuno Nvasconcelos, UC San Diego

Stochastic Structured Prediction under Bandit Feedback Artem Sokolov, Heidelberg University; Julia Kreutzer, Heidelberg University; Stefan Riezler*, Heidelberg University

Simple and Efficient Weighted Minwise Hashing Anshumali Shrivastava*, Rice University

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation Ilija Bogunovic*, EPFL Lausanne; Jonathan Scarlett, ; Andreas Krause, ; Volkan Cevher,

Structured Sparse Regression via Greedy Hard Thresholding Prateek Jain, Microsoft Research; Nikhil Rao*, ; Inderjit Dhillon,

Understanding Probabilistic Sparse Gaussian Process Approximations Matthias Bauer*, University of Cambridge; Mark van der Wilk, University of Cambridge; Carl Rasmussen, University of Cambridge

SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques Elad Richardson*, Technion; Rom Herskovitz, ; Boris Ginsburg, ; Michael Zibulevsky,

Long-Term Trajectory Planning Using Hierarchical Memory Networks Stephan Zheng*, Caltech; Yisong Yue, ; Patrick Lucey, Stats

Learning Tree Structured Potential Games Vikas Garg*, MIT; Tommi Jaakkola,

Observational-Interventional Priors for Dose-Response Learning Ricardo Silva*,

Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs Shahin Jabbari*, University of Pennsylvania; Ryan Rogers, University of Pennsylvania; Aaron Roth, ; Steven Wu, University of Pennsylvania

Identification and Overidentification of Linear Structural Equation Models Bryant Chen*, UCLA

Adaptive Skills Adaptive Partitions (ASAP) Daniel Mankowitz*, Technion; Timothy Mann, Google DeepMind; Shie Mannor, Technion

Multiple-Play Bandits in the Position-Based Model Paul Lagrée*, Université Paris Sud; Claire Vernade, Université Paris Saclay; Olivier Cappe,

Optimal Black-Box Reductions Between Optimization Objectives Zeyuan Allen-Zhu*, Princeton University; Elad Hazan,

On Valid Optimal Assignment Kernels and Applications to Graph Classification Nils Kriege*, TU Dortmund; Pierre-Louis Giscard, University of York; Richard Wilson, University of York

Robustness of classifiers: from adversarial to random noise Alhussein Fawzi, ; Seyed-Mohsen Moosavi-Dezfooli*, EPFL; Pascal Frossard, EPFL

A Non-convex One-Pass Framework for Factorization Machines and Rank-One Matrix Sensing Ming Lin*, ; Jieping Ye,

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters Zeyuan Allen-Zhu*, Princeton University; Yang Yuan, Cornell University; Karthik Sridharan, University of Pennsylvania

Combinatorial Multi-Armed Bandit with General Reward Functions Wei Chen*, ; Wei Hu, Princeton University; Fu Li, The University of Texas at Austin; Jian Li, Tsinghua University; Yu Liu, Tsinghua University; Pinyan Lu, Shanghai University of Finance and Economics

Boosting with Abstention Corinna Cortes, ; Giulia DeSalvo*, ; Mehryar Mohri,

Regret of Queueing Bandits Subhashini Krishnasamy, The University of Texas at Austin; Rajat Sen, The University of Texas at Austin; Ramesh Johari, ; Sanjay Shakkottai*, The University of Texas at Aus

Deep Learning Games Dale Schuurmans*, ; Martin Zinkevich, Google

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods Antoine Gautier*, Saarland University; Quynh Nguyen, Saarland University; Matthias Hein, Saarland University

Learning Volumetric 3D Object Reconstruction from Single-View with Projective Transformations Xinchen Yan*, University of Michigan; Jimei Yang, ; Ersin Yumer, Adobe Research; Yijie Guo, University of Michigan; Honglak Lee, University of Michigan

A Credit Assignment Compiler for Joint Prediction Kai-Wei Chang*, ; He He, University of Maryland; Stephane Ross, Google; Hal III, ; John Langford,

Accelerating Stochastic Composition Optimization Mengdi Wang*, ; Ji Liu,

Reward Augmented Maximum Likelihood for Neural Structured Prediction Mohammad Norouzi*, ; Dale Schuurmans, ; Samy Bengio, ; zhifeng Chen, ; Navdeep Jaitly, ; Mike Schuster, ; Yonghui Wu,

Consistent Kernel Mean Estimation for Functions of Random Variables Adam Scibior*, University of Cambridge; Carl-Johann Simon-Gabriel, MPI Tuebingen; Iliya Tolstikhin, ; Bernhard Schoelkopf,

Towards Unifying Hamiltonian Monte Carlo and Slice Sampling Yizhe Zhang*, Duke university; Xiangyu Wang, Duke University; Changyou Chen, ; Ricardo Henao, ; Kai Fan, Duke university; Lawrence Carin,

Scalable Adaptive Stochastic Optimization Using Random Projections Gabriel Krummenacher*, ETH Zurich; Brian Mcwilliams, Disney Research; Yannic Kilcher, ETH Zurich; Joachim Buhmann, ETH Zurich; Nicolai Meinshausen,

Variational Inference in Mixed Probabilistic Submodular Models Josip Djolonga, ETH Zurich; Sebastian Tschiatschek*, ETH Zurich; Andreas Krause,

Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated Namrata Vaswani*, ; Han Guo, Iowa State University

The Multi-fidelity Multi-armed Bandit Kirthevasan Kandasamy*, CMU; Gautam Dasarathy, Carnegie Mellon University; Barnabas Poczos, ; Jeff Schneider, CMU

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm Kejun Huang*, University of Minnesota; Xiao Fu, University of Minnesota; Nicholas Sidiropoulos, University of Minnesota

Bootstrap Model Aggregation for Distributed Statistical Learning JUN HAN, Dartmouth College; Qiang Liu*,

A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification Steven Cheng-Xian Li*, UMass Amherst; Benjamin Marlin,

A Bandit Framework for Strategic Regression Yang Liu*, Harvard University; Yiling Chen,

Architectural Complexity Measures of Recurrent Neural Networks Saizheng Zhang*, University of Montreal; Yuhuai Wu, University of Toronto; Tong Che, IHES; Zhouhan Lin, University of Montreal; Roland Memisevic, University of Montreal; Ruslan Salakhutdinov, University of Toronto; Yoshua Bengio, U. Montreal

Statistical Inference for Cluster Trees Jisu Kim*, Carnegie Mellon University; Yen-Chi Chen, Carnegie Mellon University; Sivaraman Balakrishnan, Carnegie Mellon University; Alessandro Rinaldo, Carnegie Mellon University; Larry Wasserman, Carnegie Mellon University

Contextual-MDPs for PAC Reinforcement Learning with Rich Observations Akshay Krishnamurthy*, ; Alekh Agarwal, Microsoft; John Langford,

Improved Deep Metric Learning with Multi-class N-pair Loss Objective Kihyuk Sohn*,

Only H is left: Near-tight Episodic PAC RL Christoph Dann*, Carnegie Mellon University; Emma Brunskill, Carnegie Mellon University

Stacked Approximated Regression Machine: A Simple Deep Learning Approach Zhangyang Wang*, UIUC; Shiyu Chang, UIUC; Qing Ling, USTC; Shuai Huang, UW; Xia Hu, ; Honghui Shi, UIUC; Thomas Huang, UIUC

Unsupervised Learning of Spoken Language with Visual Context David Harwath*, MIT CSAIL; Antonio Torralba, MIT CSAIL; James Glass, MIT CSAIL

Low-Rank Regression with Tensor Responses Guillaume Rabusseau*, Aix-Marseille University; Hachem Kadri,

PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain*, ; Francis Bach, ; Alexandre Lacoste, ; Simon Lacoste-Julien, INRIA

Data Poisoning Attacks on Factorization-Based Collaborative Filtering Bo Li*, Vanderbilt University; Yining Wang, Carnegie Mellon University; Aarti Singh, Carnegie Mellon University; yevgeniy Vorobeychik, Vanderbilt University

Learned Region Sparsity and Diversity Also Predicts Visual Attention Zijun Wei*, Stony Brook; Hossein Adeli, ; Minh Hoai, ; Gregory Zelinsky, ; Dimitris Samaras,

End-to-End Goal-Driven Web Navigation Rodrigo Frassetto Nogueira*, New York University; Kyunghyun Cho, University of Montreal

Automated scalable segmentation of neurons from multispectral images Uygar Sümbül*, Columbia University; Douglas Roossien, University of Michigan, Ann Arbor; Dawen Cai, University of Michigan, Ann Arbor; John Cunningham, Columbia University; Liam Paninski,

Privacy Odometers and Filters: Pay-as-you-Go Composition Ryan Rogers*, University of Pennsylvania; Salil Vadhan, Harvard University; Aaron Roth, ; Jonathan Robert Ullman,

Minimax Estimation of Maximal Mean Discrepancy with Radial Kernels Iliya Tolstikhin*, ; Bharath Sriperumbudur, ; Bernhard Schoelkopf,

Adaptive optimal training of animal behavior Ji Hyun Bak*, Princeton University; Jung Yoon Choi, ; Ilana Witten, ; Jonathan Pillow,

Hierarchical Object Representation for Open-Ended Object Category Learning and Recognition Hamidreza Kasaei*, IEETA, University of Aveiro

Relevant sparse codes with variational information bottleneck Matthew Chalk*, IST Austria; Olivier Marre, Institut de la vision; Gašper Tkačik, Institute of Science and Technology Austria

Combinatorial Energy Learning for Image Segmentation Jeremy Maitin-Shepard*, Google; Viren Jain, Google; Michal Januszewski, Google; Peter Li, ; Pieter Abbeel,

Orthogonal Random Features Felix Xinnan Yu*, ; Ananda Theertha Suresh, ; Krzysztof Choromanski, ; Dan Holtmann-Rice, ; Sanjiv Kumar, Google

Fast Active Set Methods for Online Spike Inference from Calcium Imaging Johannes Friedrich*, Columbia University; Liam Paninski,

Diffusion-Convolutional Neural Networks James Atwood*, UMass Amherst

Bayesian latent structure discovery from multi-neuron recordings Scott Linderman*, ; Ryan Adams, ; Jonathan Pillow,

A Probabilistic Programming Approach To Probabilistic Data Analysis Feras Saad*, MIT; Vikash Mansinghka, MIT

A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics William Hoiles*, University of California, Los ; Mihaela Van Der Schaar,

Inference by Reparameterization in Neural Population Codes RAJKUMAR VASUDEVA RAJU, Rice University; Xaq Pitkow*,

Tensor Switching Networks Chuan-Yung Tsai*, ; Andrew Saxe, ; David Cox,

Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo Alain Durmus, Telecom ParisTech; Umut Simsekli*, ; Eric Moulines, Ecole Polytechnique; Roland Badeau, Telecom ParisTech; Gaël Richard, Telecom ParisTech

Coordinate-wise Power Method Qi Lei*, UT AUSTIN; Kai Zhong, UT AUSTIN; Inderjit Dhillon,

Learning Influence Functions from Incomplete Observations Xinran He*, USC; Ke Xu, USC; David Kempe, USC; Yan Liu,

Learning Structured Sparsity in Deep Neural Networks Wei Wen*, University of Pittsburgh; Chunpeng Wu, University of Pittsburgh; Yandan Wang, University of Pittsburgh; Yiran Chen, University of Pittsburgh; Hai Li, University of Pittsburg

Sample Complexity of Automated Mechanism Design Nina Balcan, ; Tuomas Sandholm, Carnegie Mellon University; Ellen Vitercik*, Carnegie Mellon University

Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products SANGHAMITRA DUTTA*, Carnegie Mellon University; Viveck Cadambe, Pennsylvania State University; Pulkit Grover, Carnegie Mellon University

Brains on Beats Umut Güçlü*, Radboud University; Jordy Thielen, Radboud University; Michael Hanke, Otto-von-Guericke University Magdeburg; Marcel Van Gerven, Radboud University

Learning Transferrable Representations for Unsupervised Domain Adaptation Ozan Sener*, Cornell University; Hyun Oh Song, Google Research; Ashutosh Saxena, Brain of Things; Silvio Savarese, Stanford University

Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles Stefan Lee*, Indiana University; Senthil Purushwalkam, Carnegie Mellon; Michael Cogswell, Virginia Tech; Viresh Ranjan, Virginia Tech; David Crandall, Indiana University; Dhruv Batra,

Active Learning from Imperfect Labelers Songbai Yan*, University of California, San Diego; Kamalika Chaudhuri, University of California, San Diego; Tara Javidi, University of California, San Diego

Learning to Communicate with Deep Multi-Agent Reinforcement Learning Jakob Foerster*, University of Oxford; Yannis Assael, University of Oxford; Nando de Freitas, University of Oxford; Shimon Whiteson,

Value Iteration Networks Aviv Tamar*, ; Sergey Levine, ; Pieter Abbeel, ; Yi Wu, UC Berkeley; Garrett Thomas, UC Berkeley

Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering Dogyoon Song*, MIT; Christina Lee, MIT; Yihua Li, MIT; Devavrat Shah,

On the Recursive Teaching Dimension of VC Classes Bo Tang*, University of Oxford; Xi Chen, Columbia University; Yu Cheng, U of Southern California

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen*, UC Berkeley; OpenAI; Yan Duan, UC Berkeley; Rein Houthooft, Ghent University - iMinds; UC Berkeley; OpenAI; John Schulman, OpenAI; Ilya Sutskever, ; Pieter Abbeel,

Hardness of Online Sleeping Combinatorial Optimization Problems Satyen Kale*, ; Chansoo Lee, ; David Pal,

Mixed Linear Regression with Multiple Components Kai Zhong*, UT AUSTIN; Prateek Jain, Microsoft Research; Inderjit Dhillon,

Sequential Neural Models with Stochastic Layers Marco Fraccaro*, DTU; Søren Sønderby, KU; Ulrich Paquet, ; Ole Winther, DTU

Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences Hongseok Namkoong*, Stanford University; John Duchi,

Minimizing Quadratic Functions in Constant Time Kohei Hayashi*, AIST; Yuichi Yoshida, NII

Improved Techniques for Training GANs Tim Salimans*, ; Ian Goodfellow, OpenAI; Wojciech Zaremba, OpenAI; Vicki Cheung, OpenAI; Alec Radford, OpenAI; Xi Chen, UC Berkeley; OpenAI

DeepMath - Deep Sequence Models for Premise Selection Geoffrey Irving*, ; Christian Szegedy, ; Alexander Alemi, Google; Francois Chollet, ; Josef Urban, Czech Technical University in Prague

Learning Multiagent Communication with Backpropagation Sainbayar Sukhbaatar, NYU; Arthur Szlam, ; Rob Fergus*, New York University Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity Amit Daniely*, ; Roy Frostig, Stanford University; Yoram Singer, Google

Learning the Number of Neurons in Deep Networks Jose Alvarez*, NICTA; Mathieu Salzmann, EPFL

Finding significant combinations of features in the presence of categorical covariates Laetitia Papaxanthos*, ETH Zurich; Felipe Llinares, ETH Zurich; Dean Bodenham, ETH Zurich; Karsten Borgwardt,

Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning Been Kim*, ; Rajiv Khanna, UT Austin; Sanmi Koyejo, UIUC

Optimistic Bandit Convex Optimization Scott Yang*, New York University; Mehryar Mohri,

Safe Policy Improvement by Minimizing Robust Baseline Regret Mohamad Ghavamzadeh*, ; Marek Petrik, ; Yinlam Chow, Stanford University

Graphons, mergeons, and so on! Justin Eldridge*, The Ohio State University; Mikhail Belkin, ; Yusu Wang, The Ohio State University

Hierarchical Clustering via Spreading Metrics Aurko Roy*, Georgia Tech; Sebastian Pokutta, GeorgiaTech

Learning Bayesian networks with ancestral constraints Eunice Yuh-Jie Chen*, UCLA; Yujia Shen, ; Arthur Choi, ; Adnan Darwiche,

Pruning Random Forests for Prediction on a Budget Feng Nan*, Boston University; Joseph Wang, Boston University; Venkatesh Saligrama,

Clustering with Bregman Divergences: an Asymptotic Analysis Chaoyue Liu*, The Ohio State University; Mikhail Belkin,

Variational Autoencoder for Deep Learning of Images, Labels and Captions Yunchen Pu*, Duke University; Zhe Gan, Duke; Ricardo Henao, ; Xin Yuan, Bell Labs; chunyuan Li, Duke; Andrew Stevens, Duke University; Lawrence Carin,

Encode, Review, and Decode: Reviewer Module for Caption Generation Zhilin Yang*, Carnegie Mellon University; Ye Yuan, Carnegie Mellon University; Yuexin Wu, Carnegie Mellon University; William Cohen, Carnegie Mellon University; Ruslan Salakhutdinov, University of Toronto

Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu*, ; Dilin Wang, Dartmouth College

A Bio-inspired Redundant Sensing Architecture Anh Tuan Nguyen*, University of Minnesota; Jian Xu, University of Minnesota; Zhi Yang, University of Minnesota

Contextual semibandits via supervised learning oracles Akshay Krishnamurthy*, ; Alekh Agarwal, Microsoft; Miro Dudik,

Blind Attacks on Machine Learners Alex Beatson*, Princeton University; Zhaoran Wang, Princeton University; Han Liu,

Universal Correspondence Network Christopher Choy*, Stanford University; Manmohan Chandraker, NEC Labs America; JunYoung Gwak, Stanford University; Silvio Savarese, Stanford University

Satisfying Real-world Goals with Dataset Constraints Gabriel Goh*, UC Davis; Andy Cotter, ; Maya Gupta, ; Michael Friedlander, UC Davis

Deep Learning for Predicting Human Strategic Behavior Jason Hartford*, University of British Columbia; Kevin Leyton-Brown, ; James Wright, University of British Columbia

Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games Sougata Chaudhuri*, University of Michigan ; Ambuj Tewari, University of Michigan

Eliciting and Aggregating Categorical Data Yiling Chen, ; Rafael Frongillo, ; Chien-Ju Ho*,

Measuring the reliability of MCMC inference with Bidirectional Monte Carlo Roger Grosse, ; Siddharth Ancha, University of Toronto; Daniel Roy*,

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation Weihao Gao, UIUC; Sewoong Oh*, ; Pramod Viswanath, UIUC

Selective inference for group-sparse linear models Fan Yang, University of Chicago; Rina Foygel Barber*, ; Prateek Jain, Microsoft Research; John Lafferty,

Graph Clustering: Block-models and model free results Yali Wan*, University of Washington; Marina Meila, University of Washington

Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution Christopher Lynn*, University of Pennsylvania; Dan Lee , University of Pennsylvania

Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Neuroscience Hao Zhou, University of Wisconsin Madiso; Vamsi Ithapu*, University of Wisconsin Madison; Sathya Ravi, University of Wisconsin Madiso; Vikas Singh, UW Madison; Grace Wahba, University of Wisconsin Madison; Sterling Johnson, University of Wisconsin Madison

Geometric Dirichlet Means Algorithm for Topic Inference Mikhail Yurochkin*, University of Michigan; Long Nguyen,

Structured Prediction Theory Based on Factor Graph Complexity Corinna Cortes, ; Vitaly Kuznetsov*, Courant Institute; Mehryar Mohri, ; Scott Yang, New York University

Improved Dropout for Shallow and Deep Learning Zhe Li, The University of Iowa; Boqing Gong, University of Central Florida; Tianbao Yang*, University of Iowa

Constraints Based Convex Belief Propagation Yaniv Tenzer*, The Hebrew University; Alexander Schwing, ; Kevin Gimpel, ; Tamir Hazan,

Error Analysis of Generalized Nyström Kernel Regression Hong Chen, University of Texas; Haifeng Xia, Huazhong Agricultural University; Heng Huang*, University of Texas Arlington

A Probabilistic Framework for Deep Learning Ankit Patel, Baylor College of Medicine; Rice University; Tan Nguyen*, Rice University; Richard Baraniuk,

General Tensor Spectral Co-clustering for Higher-Order Data Tao Wu*, Purdue University; Austin Benson, Stanford University; David Gleich,

Cyclades: Conflict-free Asynchronous Machine Learning Xinghao Pan*, UC Berkeley; Stephen Tu, UC Berkeley; Maximilian Lam, UC Berkeley; Dimitris Papailiopoulos, ; Ce Zhang, Stanford; Michael Jordan, ; Kannan Ramchandran, ; Christopher Re, ; Ben Recht,

Single Pass PCA of Matrix Products Shanshan Wu*, UT Austin; Srinadh Bhojanapalli, TTI Chicago; Sujay Sanghavi, ; Alexandros G. Dimakis,

Stochastic Variational Deep Kernel Learning Andrew Wilson*, Carnegie Mellon University; Zhiting Hu, Carnegie Mellon University; Ruslan Salakhutdinov, University of Toronto; Eric Xing, Carnegie Mellon University

Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models Marc Vuffray*, Los Alamos National Laboratory; Sidhant Misra, Los Alamos National Laboratory; Andrey Lokhov, Los Alamos National Laboratory; Misha Chertkov, Los Alamos National Laboratory

Long-term Causal Effects via Behavioral Game Theory Panos Toulis*, University of Chicago; David Parkes, Harvard University

Measuring Neural Net Robustness with Constraints Osbert Bastani*, Stanford University; Yani Ioannou, University of Cambridge; Leonidas Lampropoulos, University of Pennsylvania; Dimitrios Vytiniotis, Microsoft Research; Aditya Nori, Microsoft Research; Antonio Criminisi,

Reshaped Wirtinger Flow for Solving Quadratic Systems of Equations Huishuai Zhang*, Syracuse University; Yingbin Liang, Syracuse University

Nearly Isometric Embedding by Relaxation James McQueen*, University of Washington; Marina Meila, University of Washington; Dominique Joncas, Google

Probabilistic Inference with Generating Functions for Poisson Latent Variable Models Kevin Winner*, UMass CICS; Daniel Sheldon,

Causal meets Submodular: Subset Selection with Directed Information Yuxun Zhou*, UC Berkeley; Costas Spanos,

Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions Ayan Chakrabarti*, ; Jingyu Shao, UCLA; Greg Shakhnarovich,

Deep Neural Networks with Inexact Matching for Person Re-Identification Arulkumar Subramaniam, IIT Madras; Moitreya Chatterjee*, IIT Madras; Anurag Mittal, IIT Madras

Global Analysis of Expectation Maximization for Mixtures of Two Gaussians Ji Xu, Columbia university; Daniel Hsu*, ; Arian Maleki, Columbia University

Estimating the class prior and posterior from noisy positives and unlabeled data Shanatnu Jain*, Indiana University; Martha White, ; Predrag Radivojac,

Kronecker Determinantal Point Processes Zelda Mariet*, MIT; Suvrit Sra, MIT

Finite Sample Prediction and Recovery Bounds for Ordinal Embedding Lalit Jain*, University of Wisconsin-Madison; Kevin Jamieson, UC Berkeley; Robert Nowak, University of Wisconsin Madison

Feature-distributed sparse regression: a screen-and-clean approach Jiyan Yang*, Stanford University; Michael Mahoney, ; Michael Saunders, Stanford University; Yuekai Sun, University of Michigan

Learning Bound for Parameter Transfer Learning Wataru Kumagai*, Kanagawa University

Learning under uncertainty: a comparison between R-W and Bayesian approach He Huang*, LIBR; Martin Paulus, LIBR

Bi-Objective Online Matching and Submodular Allocations Hossein Esfandiari*, University of Maryland; Nitish Korula, Google Research; Vahab Mirrokni, Google

Quantized Random Projections and Non-Linear Estimation of Cosine Similarity Ping Li, ; Michael Mitzenmacher, Harvard University; Martin Slawski*,

The non-convex Burer-Monteiro approach works on smooth semidefinite programs Nicolas Boumal, ; Vlad Voroninski*, MIT; Afonso Bandeira,

Dimensionality Reduction of Massive Sparse Datasets Using Coresets Dan Feldman, ; Mikhail Volkov*, MIT; Daniela Rus, MIT

Using Social Dynamics to Make Individual Predictions: Variational Inference with Stochastic Kinetic Model Zhen Xu*, SUNY at Buffalo; Wen Dong, ; Sargur Srihari,

Supervised learning through the lens of compression Ofir David*, Technion - Israel institute of technology; Shay Moran, Technion - Israel institue of Technology; Amir Yehudayoff, Technion - Israel institue of Technology

Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data Xinghua Lou*, Vicarious FPC Inc; Ken Kansky, ; Wolfgang Lehrach, ; CC Laan, ; Bhaskara Marthi, ; D. Scott Phoenix, ; Dileep George,

Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections Xiao-Jiao Mao, Nanjing University; Chunhua Shen*, ; Yu-Bin Yang,

Object based Scene Representations using Fisher Scores of Local Subspace Projections Mandar Dixit*, UC San Diego; Nuno Vasconcelos,

Active Learning with Oracle Epiphany Tzu-Kuo Huang, Microsoft Research; Lihong Li, Microsoft Research; Ara Vartanian, University of Wisconsin-Madison; Saleema Amershi, Microsoft; Xiaojin Zhu*,

Statistical Inference for Pairwise Graphical Models Using Score Matching Ming Yu*, The University of Chicago; Mladen Kolar, ; Varun Gupta, University of Chicago

Improved Error Bounds for Tree Representations of Metric Spaces Samir Chowdhury*, The Ohio State University; Facundo Memoli, ; Zane Smith,

Can Peripheral Representations Improve Clutter Metrics on Complex Scenes? Arturo Deza*, UCSB; Miguel Eckstein, UCSB

On Multiplicative Integration with Recurrent Neural Networks Yuhuai Wu*, University of Toronto; Saizheng Zhang, University of Montreal; ying Zhang, University of Montreal; Yoshua Bengio, U. Montreal; Ruslan Salakhutdinov, University of Toronto

Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices Kirthevasan Kandasamy*, CMU; Maruan Al-Shedivat, CMU; Eric Xing, Carnegie Mellon University

Regret Bounds for Non-decomposable Metrics with Missing Labels Nagarajan Natarajan*, Microsoft Research Bangalore; Prateek Jain, Microsoft Research

Robust k-means: a Theoretical Revisit ALEXANDROS GEORGOGIANNIS*, TECHNICAL UNIVERSITY OF CRETE

Bayesian optimization for automated model selection Gustavo Malkomes, Washington University; Charles Schaff, Washington University in St. Louis; Roman Garnett*,

A Probabilistic Model of Social Decision Making based on Reward Maximization Koosha Khalvati*, University of Washington; Seongmin Park, Cognitive Neuroscience Center; Jean-Claude Dreher, Centre de Neurosciences Cognitives; Rajesh Rao, University of Washington

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition Ahmed Alaa*, UCLA; Mihaela Van Der Schaar,

Fast and Flexible Monotonic Functions with Ensembles of Lattices Mahdi Fard, ; Kevin Canini, ; Andy Cotter, ; Jan Pfeifer, Google; Maya Gupta*,

Conditional Generative Moment-Matching Networks Yong Ren, Tsinghua University; Jun Zhu*, ; Jialian Li, Tsinghua University; Yucen Luo,

Stochastic Gradient MCMC with Stale Gradients Changyou Chen*, ; Nan Ding, Google; chunyuan Li, Duke; Yizhe Zhang, Duke university; Lawrence Carin,

Composing graphical models with neural networks for structured representations and fast inference Matthew Johnson, ; David Duvenaud*, ; Alex Wiltschko, Harvard University and Twitter; Ryan Adams, ; Sandeep Datta, Harvard Medical School

Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling Nina Balcan, ; Hongyang Zhang*, CMU

Combinatorial semi-bandit with known covariance Rémy Degenne*, Université Paris Diderot; Vianney Perchet,

Matrix Completion has No Spurious Local Minimum Rong Ge, ; Jason Lee, UC Berkeley; Tengyu Ma*, Princeton University

The Multiscale Laplacian Graph Kernel Risi Kondor*, ; Horace Pan, UChicago

Adaptive Averaging in Accelerated Descent Dynamics Walid Krichene*, UC Berkeley; Alexandre Bayen, UC Berkeley; Peter Bartlett,

Sub-sampled Newton Methods with Non-uniform Sampling Peng Xu*, Stanford University; Jiyan Yang, Stanford University; Farbod Roosta-Khorasani, University of California Berkeley; Christopher Re, ; Michael Mahoney,

Stochastic Gradient Geodesic MCMC Methods Chang Liu*, Tsinghua University; Jun Zhu, ; Yang Song, Stanford University

Variational Bayes on Monte Carlo Steroids Aditya Grover*, Stanford University; Stefano Ermon,

Showing versus doing: Teaching by demonstration Mark Ho*, Brown University; Michael L. Littman, ; James MacGlashan, Brown University; Fiery Cushman, Harvard University; Joe Austerweil,

Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation Jianxu Chen*, University of Notre Dame; Lin Yang, University of Notre Dame; Yizhe Zhang, University of Notre Dame; Mark Alber, University of Notre Dame; Danny Chen, University of Notre Dame

Maximization of Approximately Submodular Functions Thibaut Horel*, Harvard University; Yaron Singer,

A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order Xiangru Lian, University of Rochester; Huan Zhang, ; Cho-Jui Hsieh, ; Yijun Huang, ; Ji Liu*,

Learning Infinite RBMs with Frank-Wolfe Wei Ping*, UC Irvine; Qiang Liu, ; Alexander Ihler,

Estimating the Size of a Large Network and its Communities from a Random Sample Lin Chen*, Yale University; Amin Karbasi, ; Forrest Crawford, Yale University

Learning Sensor Multiplexing Design through Back-propagation Ayan Chakrabarti*,

On Robustness of Kernel Clustering Bowei Yan*, University of Texas at Austin; Purnamrita Sarkar, U.C. Berkeley

High resolution neural connectivity from incomplete tracing data using nonnegative spline regression Kameron Harris*, University of Washington; Stefan Mihalas, Allen Institute for Brain Science; Eric Shea-Brown, University of Washington

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild Gregory Rogez*, Inria; Cordelia Schmid,

A New Liftable Class for First-Order Probabilistic Inference Seyed Mehran Kazemi*, UBC; Angelika Kimmig, KU Leuven; Guy Van den Broeck, ; David Poole, UBC

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization Jian Wu*, Cornell University; Peter I. Frazier,

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits Vasilis Syrgkanis*, ; Haipeng Luo, Princeton University; Akshay Krishnamurthy, ; Robert Schapire,

Consistent Estimation of Functions of Data Missing Non-Monotonically and Not at Random Ilya Shpitser*,

Optimistic Gittins Indices Eli Gutin*, Massachusetts Institute of Tec; Vivek Farias,

Finite-Dimensional BFRY Priors and Variational Bayesian Inference for Power Law Models Juho Lee*, POSTECH; Lancelot James, HKUST; Seungjin Choi, POSTECH

Launch and Iterate: Reducing Prediction Churn Mahdi Fard, ; Quentin Cormier, Google; Kevin Canini, ; Maya Gupta*,

“Congruent” and “Opposite” Neurons: Sisters for Multisensory Integration and Segregation Wen-Hao Zhang*, Institute of Neuroscience, Chinese Academy of Sciences; He Wang, HKUST; K. Y. Michael Wong, HKUST; Si Wu,

Learning shape correspondence with anisotropic convolutional neural networks Davide Boscaini*, University of Lugano; Jonathan Masci, ; Emanuele Rodolà, University of Lugano; Michael Bronstein, University of Lugano

Pairwise Choice Markov Chains Stephen Ragain*, Stanford University; Johan Ugander,

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization Davood Hajinezhad*, Iowa State University; Mingyi Hong, ; Tuo Zhao, Johns Hopkins University; Zhaoran Wang, Princeton University

Clustering with Same-Cluster Queries Hassan Ashtiani, University of Waterloo; Shrinu Kushagra*, University of Waterloo; Shai Ben-David, U. Waterloo

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S. M. Ali Eslami*, Google DeepMind; Nicolas Heess, ; Theophane Weber, ; Yuval Tassa, Google DeepMind; David Szepesvari, Google DeepMind; Koray Kavukcuoglu, Google DeepMind; Geoffrey Hinton, Google

Parameter Learning for Log-supermodular Distributions Tatiana Shpakova*, Inria - ENS Paris; Francis Bach,

Deconvolving Feedback Loops in Recommender Systems Ayan Sinha*, Purdue; David Gleich, ; Karthik Ramani, Purdue University

Structured Matrix Recovery via the Generalized Dantzig Selector Sheng Chen*, University of Minnesota; Arindam Banerjee,

Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making Himabindu Lakkaraju*, Stanford University; Jure Leskovec,

Automatic Neuron Detection in Calcium Imaging Data Using Convolutional Networks Noah Apthorpe*, Princeton University; Alexander Riordan, Princeton University; Robert Aguilar, Princeton University; Jan Homann, Princeton University; Yi Gu, Princeton University; David Tank, Princeton University; H. Sebastian Seung, Princeton University

Designing smoothing functions for improved worst-case competitive ratio in online optimization Reza Eghbali*, University of washington; Maryam Fazel, University of Washington

Convergence guarantees for kernel-based quadrature rules in misspecified settings Motonobu Kanagawa*, ; Bharath Sriperumbudur, ; Kenji Fukumizu,

Unsupervised Learning from Noisy Networks with Applications to Hi-C Data Bo Wang*, Stanford University; Junjie Zhu, Stanford University; Armin Pourshafeie, Stanford University

A non-generative theory for unsupervised learning and efficient improper dictionary learning Elad Hazan, ; Tengyu Ma*, Princeton University

Equality of Opportunity in Supervised Learning Moritz Hardt*, ; Eric Price, ; Nathan Srebro,

Scaled Least Squares Estimator for GLMs in Large-Scale Problems Murat Erdogdu*, Stanford University; Lee Dicker, ; Mohsen Bayati,

Interpretable Nonlinear Dynamic Modeling of Neural Trajectories Yuan Zhao*, Stony Brook University; Il Memming Park,

Search Improves Label for Active Learning Alina Beygelzimer, Yahoo Inc; Daniel Hsu, ; John Langford, ; Chicheng Zhang*, UCSD

Higher-Order Factorization Machines Mathieu Blondel*, NTT; Akinori Fujino, NTT; Naonori Ueda, ; Masakazu Ishihata, Hokkaido University

Exponential expressivity in deep neural networks through transient chaos Ben Poole*, Stanford University; Subhaneil Lahiri, Stanford University; Maithra Raghu, Cornell University; Jascha Sohl-Dickstein, ; Surya Ganguli, Stanford

Split LBI: An Iterative Regularization Path with Structural Sparsity Chendi Huang, Peking University; Xinwei Sun, ; Jiechao Xiong, Peking University; Yuan Yao*,

An equivalence between high dimensional Bayes optimal inference and M-estimation Madhu Advani*, Stanford University; Surya Ganguli, Stanford

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks Anh Nguyen*, University of Wyoming; Alexey Dosovitskiy, ; Jason Yosinski, Cornell; Thomas Brox, University of Freiburg; Jeff Clune,

Deep Submodular Functions Brian Dolhansky*, University of Washington; Jeff Bilmes, University of Washington, Seattle

Discriminative Gaifman Models Mathias Niepert*,

Leveraging Sparsity for Efficient Submodular Data Summarization Erik Lindgren*, University of Texas at Austin; Shanshan Wu, UT Austin; Alexandros G. Dimakis,

Local Minimax Complexity of Stochastic Convex Optimization Sabyasachi Chatterjee, University of Chicago; John Duchi, ; John Lafferty, ; Yuancheng Zhu*, University of Chicago

Stochastic Optimization for Large-scale Optimal Transport Aude Genevay*, Université Paris Dauphine; Marco Cuturi, ; Gabriel Peyré, ; Francis Bach,

On Mixtures of Markov Chains Rishi Gupta*, Stanford; Ravi Kumar, ; Sergei Vassilvitskii, Google

Linear Contextual Bandits with Knapsacks Shipra Agrawal*, ; Nikhil Devanur, Microsoft Research

Reconstructing Parameters of Spreading Models from Partial Observations Andrey Lokhov*, Los Alamos National Laboratory

Spatiotemporal Residual Networksfor Video Action Recognition Christoph Feichtenhofer*, Graz University of Technology; Axel Pinz, Graz University of Technology; Richard Wildes, York University Toronto

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations Behnam Neyshabur*, TTI-Chicago; Yuhuai Wu, University of Toronto; Ruslan Salakhutdinov, University of Toronto; Nathan Srebro,

Strategic Attentive Writer for Learning Macro-Actions Alexander Vezhnevets*, Google DeepMind; Volodymyr Mnih, ; Simon Osindero, Google DeepMind; Alex Graves, ; Oriol Vinyals, ; John Agapiou, ; Koray Kavukcuoglu, Google DeepMind

The Limits of Learning with Missing Data Brian Bullins*, Princeton University; Elad Hazan, ; Tomer Koren, Technion---Israel Inst. of Technology

RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism Edward Choi*, Georgia Institute of Technolog; Mohammad Taha Bahadori, Gatech; Jimeng Sun,

Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers Yu-Xiang Wang*, Carnegie Mellon University; Veeranjaneyulu Sadhanala, Carnegie Mellon University; Ryan Tibshirani,

Community Detection on Evolving Graphs Stefano Leonardi*, Sapienza University of Rome; Aris Anagnostopoulos, Sapienza University of Rome; Jakub Łącki, Sapienza University of Rome; Silvio Lattanzi, Google; Mohammad Mahdian, Google Research, New York

Online and Differentially-Private Tensor Decomposition Yining Wang*, Carnegie Mellon University; Anima Anandkumar, UC Irvine

Dimension-Free Iteration Complexity of Finite Sum Optimization Problems Yossi Arjevani*, Weizmann Institute of Science; Ohad Shamir, Weizmann Institute of Science

Towards Conceptual Compression Karol Gregor*, ; Frederic Besse, Google DeepMind; Danilo Jimenez Rezende, ; Ivo Danihelka, ; Daan Wierstra, Google DeepMind

Exact Recovery of Hard Thresholding Pursuit Xiaotong Yuan*, Nanjing University of Informat; Ping Li, ; Tong Zhang,

Data Programming: Creating Large Training Sets, Quickly Alexander Ratner*, Stanford University; Christopher De Sa, Stanford University; Sen Wu, Stanford University; Daniel Selsam, Stanford; Christopher Ré, Stanford University

Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back Vitaly Feldman*,

Dynamic matrix recovery from incomplete observations under an exact low-rank constraint Liangbei Xu*, Gatech; Mark Davenport,

Fast Distributed Submodular Cover: Public-Private Data Summarization Baharan Mirzasoleiman*, ETH Zurich; Morteza Zadimoghaddam, ; Amin Karbasi,

Estimating Nonlinear Neural Response Functions using GP Priors and Kronecker Methods Cristina Savin*, IST Austria; Gašper Tkačik, Institute of Science and Technology Austria

Lifelong Learning with Weighted Majority Votes Anastasia Pentina*, IST Austria; Ruth Urner, MPI Tuebingen

Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes Jack Rae*, Google DeepMind; Jonathan Hunt, ; Ivo Danihelka, ; Tim Harley, Google DeepMind; Andrew Senior, ; Greg Wayne, ; Alex Graves, ; Timothy Lillicrap, Google DeepMind

Matching Networks for One Shot Learning Oriol Vinyals*, ; Charles Blundell, DeepMind; Timothy Lillicrap, Google DeepMind; Koray Kavukcuoglu, Google DeepMind; Daan Wierstra, Google DeepMind

Tight Complexity Bounds for Optimizing Composite Objectives Blake Woodworth*, Toyota Technological Institute; Nathan Srebro,

Graphical Time Warping for Joint Alignment of Multiple Curves Yizhi Wang, Virginia Tech; David Miller, The Pennsylvania State University; Kira Poskanzer, University of California, San Francisco; Yue Wang, Virginia Tech; Lin Tian, The University of California, Davis; Guoqiang Yu*,

Unsupervised Risk Estimation Using Only Conditional Independence Structure Jacob Steinhardt*, Stanford University; Percy Liang,

MetaGrad: Multiple Learning Rates in Online Learning Tim Van Erven*, ; Wouter M. Koolen,

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation Tejas Kulkarni, MIT; Karthik Narasimhan*, MIT; Ardavan Saeedi, MIT; Joshua Tenenbaum,

High Dimensional Structured Superposition Models Qilong Gu*, University of Minnesota; Arindam Banerjee,

Joint quantile regression in vector-valued RKHSs Maxime Sangnier*, LTCI, CNRS, Télécom ParisTech; Olivier Fercoq, ; Florence d’Alché-Buc,

The Forget-me-not Process Kieran Milan, Google DeepMind; Joel Veness*, ; James Kirkpatrick, Google DeepMind; Michael Bowling, ; Anna Koop, University of Alberta; Demis Hassabis,

Wasserstein Training of Restricted Boltzmann Machines Gregoire Montavon*, ; Klaus-Robert Muller, ; Marco Cuturi,

Communication-Optimal Distributed Clustering Jiecao Chen, Indiana University Bloomington; He Sun*, The University of Bristol; David Woodruff, ; Qin Zhang,

Probing the Compositionality of Intuitive Functions Eric Schulz*, University College London; Joshua Tenenbaum, ; David Duvenaud, ; Maarten Speekenbrink, University College London; Sam Gershman,

Ladder Variational Autoencoders Casper Kaae Sønderby*, University of Copenhagen; Tapani Raiko, ; Lars Maaløe, Technical University of Denmark; Søren Sønderby, KU; Ole Winther, Technical University of Denmark

The Multiple Quantile Graphical Model Alnur Ali*, Carnegie Mellon University; Zico Kolter, ; Ryan Tibshirani,

Threshold Learning for Optimal Decision Making Nathan Lepora*, University of Bristol

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA Aapo Hyvärinen*, ; Hiroshi Morioka, University of Helsinki

Can Active Memory Replace Attention? Łukasz Kaiser*, ; Samy Bengio,

Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning Taiji Suzuki*, ; Heishiro Kanagawa, ; Hayato Kobayashi, ; Nobuyuki Shimizu, ; Yukihiro Tagami,

The Product Cut Thomas Laurent*, Loyola Marymount University; James Von Brecht, CSULB; Xavier Bresson, ; Arthur Szlam,

Learning Sparse Gaussian Graphical Models with Overlapping Blocks Mohammad Javad Hosseini*, University of Washington; Su-In Lee,

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale Firas Abuzaid*, MIT; Joseph Bradley, Databricks; Feynman Liang, Cambridge University Engineering Department; Andrew Feng, Yahoo!; Lee Yang, Yahoo!; Matei Zaharia, MIT; Ameet Talwalkar,

Average-case hardness of RIP certification Tengyao Wang, University of Cambridge; Quentin Berthet*, ; Yaniv Plan, University of British Columbia

Forward models at Purkinje synapses facilitate cerebellar anticipatory control Ivan Herreros-Alonso*, Universitat Pompeu Fabra; Xerxes Arsiwalla, ; Paul Verschure,

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering Michaël Defferrard*, EPFL; Xavier Bresson, ; pierre Vandergheynst, EPFL

Deep Unsupervised Exemplar Learning MIGUEL BAUTISTA*, HEIDELBERG UNIVERSITY; Artsiom Sanakoyeu, Heidelberg University; Ekaterina Tikhoncheva, Heidelberg University; Björn Ommer,

Large-Scale Price Optimization via Network Flow Shinji Ito*, NEC Coorporation; Ryohei Fujimaki,

Online Pricing with Strategic and Patient Buyers Michal Feldman, TAU; Tomer Koren, Technion---Israel Inst. of Technology; Roi Livni*, Huji; Yishay Mansour, Microsoft; Aviv Zohar, huji

Global Optimality of Local Search for Low Rank Matrix Recovery Srinadh Bhojanapalli*, TTI Chicago; Behnam Neyshabur, TTI-Chicago; Nathan Srebro,

Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences Daniel Neil*, Institute of Neuroinformatics; Michael Pfeiffer, Institute of Neuroinformatics; Shih-Chii Liu,

Improving PAC Exploration Using the Median of Means Jason Pazis*, MIT; Ronald Parr, ; Jonathan How, MIT

Infinite Hidden Semi-Markov Modulated Interaction Point Process Matt Zhang*, Nicta; Peng Lin, Data61; Ting Guo, Data61; Yang Wang, Data61, CSIRO; Fang Chen, Data61, CSIRO

Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell*, UC Berkeley; Stuart Russell, UC Berkeley; Pieter Abbeel, ; Anca Dragan,

Spatio-Temporal Hilbert Maps for Continuous Occupancy Representation in Dynamic Environments Ransalu Senanayake*, The University of Sydney; Lionel Ott, The University of Sydney; Simon O'Callaghan, NICTA; Fabio Ramos, The University of Sydney

Select-and-Sample for Spike-and-Slab Sparse Coding Abdul-Saboor Sheikh, University of Oldenburg; Jörg Lücke*,

Tractable Operations for Arithmetic Circuits of Probabilistic Models Yujia Shen*, ; Arthur Choi, ; Adnan Darwiche,

Greedy Feature Construction Dino Oglic*, University of Bonn; Thomas Gaertner, The University of Nottingham

Mistake Bounds for Binary Matrix Completion Mark Herbster, ; Stephen Pasteris, UCL; Massimiliano Pontil*,

Data driven estimation of Laplace-Beltrami operator Frederic Chazal, INRIA; Ilaria Giulini, ; Bertrand Michel*,

Tracking the Best Expert in Non-stationary Stochastic Environments Chen-Yu Wei*, Academia Sinica; Yi-Te Hong, Academia Sinica; Chi-Jen Lu, Academia Sinica

Learning to learn by gradient descent by gradient descent Marcin Andrychowicz*, Google Deepmind; Misha Denil, ; Sergio Gomez, Google DeepMind; Matthew Hoffman, Google DeepMind; David Pfau, Google DeepMind; Tom Schaul, ; Nando Freitas, Google

Kernel Observers: Systems-Theoretic Modeling and Inference of Spatiotemporally Evolving Processes Harshal Maske, UIUC; Girish Chowdhary*, UIUC; Hassan Kingravi, Pindrop Security Services

Quantum Perceptron Models Ashish Kapoor*, ; Nathan Wiebe, Microsoft Research; Krysta M. Svore,

Guided Policy Search as Approximate Mirror Descent William Montgomery*, University of Washington; Sergey Levine, University of Washington

The Power of Optimization from Samples Eric Balkanski*, Harvard University; Aviad Rubinstein, UC Berkeley; Yaron Singer,

Deep Exploration via Bootstrapped DQN Ian Osband*, DeepMind; Charles Blundell, DeepMind; Alexander Pritzel, ; Benjamin Van Roy,

A Multi-step Inertial Forward-Backward Splitting Method for Non-convex Optimization Jingwei Liang*, GREYC, ENSICAEN; Jalal Fadili, ; Gabriel Peyré,

Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages Yin Cheng Ng*, University College London; Pawel Chilinski, University College London; Ricardo Silva, University College London

Convolutional Neural Fabrics Shreyas Saxena*, INRIA; Jakob Verbeek,

A Neural Transducer Navdeep Jaitly*, ; Quoc Le, ; Oriol Vinyals, ; Ilya Sutskever, ; David Sussillo, Google; Samy Bengio,

Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy Aryan Mokhtari*, University of Pennsylvania; Hadi Daneshmand, ETH Zurich; Aurelien Lucchi, ; Thomas Hofmann, ; Alejandro Ribeiro, University of Pennsylvania

A Sparse Interactive Model for Inductive Matrix Completion Jin Lu, University of Connecticut; Guannan Liang, University of Connecticut; jiangwen Sun, University of Connecticut; Jinbo Bi*, University of Connecticut

Coresets for Scalable Bayesian Logistic Regression Jonathan Huggins*, MIT; Trevor Campbell, MIT; Tamara Broderick, MIT

Agnostic Estimation for Misspecified Phase Retrieval Models Matey Neykov*, Princeton University; Zhaoran Wang, Princeton University; Han Liu,

Linear Relaxations for Finding Diverse Elements in Metric Spaces Aditya Bhaskara*, University of Utah; Mehrdad Ghadiri, Sharif University of Technolog; Vahab Mirrokni, Google; Ola Svensson, EPFL

Binarized Neural Networks Itay Hubara*, Technion; Matthieu Courbariaux, Université de Montréal; Daniel Soudry, Columbia University; Ran El-Yaniv, Technion; Yoshua Bengio, Université de Montréal

On Local Maxima in the Population Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences Chi Jin*, UC Berkeley; Yuchen Zhang, ; Sivaraman Balakrishnan, CMU; Martin Wainwright, UC Berkeley; Michael Jordan,

Memory-Efficient Backpropagation Through Time Audrunas Gruslys*, Google DeepMind; Remi Munos, Google DeepMind; Ivo Danihelka, ; Marc Lanctot, Google DeepMind; Alex Graves,

Bayesian Optimization with Robust Bayesian Neural Networks Jost Tobias Springenberg*, University of Freiburg; Aaron Klein, University of Freiburg; Stefan Falkner, University of Freiburg; Frank Hutter, University of Freiburg

Learnable Visual Markers Oleg Grinchuk, Skolkovo Institute of Science and Technology; Vadim Lebedev, Skolkovo Institute of Science and Technology; Victor Lempitsky*,

Fast Algorithms for Robust PCA via Gradient Descent Xinyang Yi*, UT Austin; Dohyung Park, University of Texas at Austin; Yudong Chen, ; Constantine Caramanis,

One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities Michalis K. Titsias*,

Learning Deep Embeddings with Histogram Loss Evgeniya Ustinova, Skoltech; Victor Lempitsky*,

Spectral Learning of Dynamic Systems from Nonequilibrium Data Hao Wu*, Free University of Berlin; Frank Noe,

Markov Chain Sampling in Discrete Probabilistic Models with Constraints Chengtao Li*, MIT; Suvrit Sra, MIT; Stefanie Jegelka, MIT

Mapping Estimation for Discrete Optimal Transport Michael Perrot*, University of Saint-Etienne, laboratoire Hubert Curien; Nicolas Courty, ; Rémi Flamary, ; Amaury Habrard, University of Saint-Etienne, Laboratoire Hubert Curien

BBO-DPPs: Batched Bayesian Optimization via Determinantal Point Processes Tarun Kathuria*, Microsoft Research; Amit Deshpande, ; Pushmeet Kohli,

Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images Vladimir Golkov*, Technical University of Munich; Marcin Skwark, Vanderbilt University; Antonij Golkov, University of Augsburg; Alexey Dosovitskiy, ; Thomas Brox, University of Freiburg; Jens Meiler, Vanderbilt University; Daniel Cremers, Technical University of Munich

Linear Feature Encoding for Reinforcement Learning Zhao Song*, Duke University; Ronald Parr, ; Xuejun Liao, Duke University; Lawrence Carin,

A Minimax Approach to Supervised Learning Farzan Farnia*, Stanford University; David Tse, Stanford University

Edge-Exchangeable Graphs and Sparsity Diana Cai*, University of Chicago; Trevor Campbell, MIT; Tamara Broderick, MIT

A Locally Adaptive Normal Distribution Georgios Arvanitidis*, DTU; Lars Kai Hansen, ; Søren Hauberg,

Completely random measures for modelling block-structured sparse networks Tue Herlau*, ; Mikkel Schmidt, DTU; Morten Mørup, Technical University of Denmark

Sparse Support Recovery with Non-smooth Loss Functions Kévin Degraux*, Université catholique de Louva; Gabriel Peyré, ; Jalal Fadili, ; Laurent Jacques, Université catholique de Louvain

Neurons Equipped with Intrinsic Plasticity Learn Stimulus Intensity Statistics Travis Monk*, University of Oldenburg; Cristina Savin, IST Austria; Jörg Lücke,

Learning values across many orders of magnitude Hado Van Hasselt*, ; Arthur Guez, ; Matteo Hessel, Google DeepMind; Volodymyr Mnih, ; David Silver,

Adaptive Smoothed Online Multi-Task Learning Keerthiram Murugesan*, Carnegie Mellon University; Hanxiao Liu, Carnegie Mellon University; Jaime Carbonell, CMU; Yiming Yang, CMU

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes Matteo Turchetta, ETH Zurich; Felix Berkenkamp*, ETH Zurich; Andreas Krause,

Probabilistic Linear Multistep Methods Onur Teymur*, Imperial College London; Kostas Zygalakis, ; Ben Calderhead,

Stochastic Three-Composite Convex Minimization Alp Yurtsever*, EPFL; Bang Vu, ; Volkan Cevher,

Using Fast Weights to Attend to the Recent Past Jimmy Ba*, University of Toronto; Geoffrey Hinton, Google; Volodymyr Mnih, ; Joel Leibo, Google DeepMind; Catalin Ionescu, Google

Maximal Sparsity with Deep Networks? Bo Xin*, Peking University; Yizhou Wang, Peking University; Wen Gao, peking university; David Wipf,

Quantifying and Reducing Stereotypes in Word Embeddings Tolga Bolukbasi*, Boston University; Kai-Wei Chang, ; James Zou, ; Venkatesh Saligrama, ; Adam Kalai, Microsoft Research

beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data Valentina Zantedeschi*, UJM Saint-Etienne, France; Rémi Emonet, ; Marc Sebban,

Learning Additive Exponential Family Graphical Models via ℓ2,1-norm Regularized M-Estimation Xiaotong Yuan*, Nanjing University of Informat; Ping Li, ; Tong Zhang, ; Qingshan Liu, ; Guangcan Liu, NUIST

Backprop KF: Learning Discriminative Deterministic State Estimators Tuomas Haarnoja*, UC Berkeley; Anurag Ajay, UC Berkeley; Sergey Levine, University of Washington; Pieter Abbeel,

2-Component Recurrent Neural Networks Xiang Li*, NJUST; Tao Qin, Microsoft; Jian Yang, ; Xiaolin Hu, ; Tie-Yan Liu, Microsoft Research

Fast recovery from a union of subspaces Chinmay Hegde, ; Piotr Indyk, MIT; Ludwig Schmidt*, MIT

Incremental Learning for Variational Sparse Gaussian Process Regression Ching-An Cheng*, Georgia Institute of Technolog; Byron Boots,

A Consistent Regularization Approach for Structured Prediction Carlo Ciliberto*, MIT; Lorenzo Rosasco, ; Alessandro Rudi,

Clustering Signed Networks with the Geometric Mean of Laplacians Pedro Eduardo Mercado Lopez*, Saarland University; Francesco Tudisco, Saarland University; Matthias Hein, Saarland University

An urn model for majority voting in classification ensembles Víctor Soto, Columbia University; Alberto Suarez, ; Gonzalo Martínez-Muñoz*,

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction Jacob Steinhardt*, Stanford University; Gregory Valiant, ; Moses Charikar, Stanford University

Fast and accurate spike sorting of high-channel count probes with KiloSort Marius Pachitariu*, ; Nick Steinmetz, UCL; Shabnam Kadir, ; Matteo Carandini, UCL; Kenneth Harris, UCL

Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning Wouter M. Koolen*, ; Peter Grunwald, CWI; Tim Van Erven,

Ancestral Causal Inference Sara Magliacane*, VU University Amsterdam; Tom Claassen, ; Joris Mooij, Radboud University Nijmegen

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning Xinyang Yi, UT Austin; Zhaoran Wang, Princeton University; Zhuoran Yang , Princeton University; Constantine Caramanis, ; Han Liu*,

Tagger: Deep Unsupervised Perceptual Grouping Klaus Greff*, IDSIA; Antti Rasmus, The Curious AI Company; Mathias Berglund, The Curious AI Company; Tele Hao, The Curious AI Company; Harri Valpola, The Curious AI Company

Efficient Algorithm for Streaming Submodular Cover Ashkan Norouzi-Fard*, EPFL; Abbas Bazzi, EPFL; Ilija Bogunovic, EPFL Lausanne; Marwa El Halabi, l; Ya-Ping Hsieh, ; Volkan Cevher,

Interaction Networks for Learning about Objects, Relations and Physics Peter Battaglia*, Google DeepMind; Razvan Pascanu, ; Matthew Lai, Google DeepMind; Danilo Jimenez Rezende, ; Koray Kavukcuoglu, Google DeepMind

Efficient state-space modularization for planning: theory, behavioral and neural signatures Daniel McNamee*, University of Cambridge; Daniel Wolpert, University of Cambridge; Máté Lengyel, University of Cambridge

Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent Chi Jin*, UC Berkeley; Sham Kakade, ; Praneeth Netrapalli, Microsoft Research

Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics Wei-Shou Hsu*, University of Waterloo; Pascal Poupart,

Computing and maximizing influence in linear threshold and triggering models Justin Khim*, University of Pennsylvania; Varun Jog, ; Po-Ling Loh, Berkeley

Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions Yichen Wang*, Georgia Tech; Nan Du, ; Rakshit Trivedi, Georgia Institute of Technolo; Le Song,

Learning Deep Parsimonious Representations Renjie Liao*, UofT; Alexander Schwing, ; Rich Zemel, ; Raquel Urtasun,

Optimal Learning for Multi-pass Stochastic Gradient Methods Junhong Lin*, Istituto Italiano di Tecnologia; Lorenzo Rosasco,

Generative Adversarial Imitation Learning Jonathan Ho*, Stanford; Stefano Ermon,

An End-to-End Approach for Natural Language to IFTTT Program Translation Chang Liu*, University of Maryland; Xinyun Chen, Shanghai Jiaotong University; Richard Shin, ; Mingcheng Chen, University of Illinois, Urbana-Champaign; Dawn Song, UC Berkeley

Dual Space Gradient Descent for Online Learning Trung Le*, University of Pedagogy Ho Chi Minh city; Tu Nguyen, Deakin University; Vu Nguyen, Deakin University; Dinh Phung, Deakin University

Fast stochastic optimization on Riemannian manifolds Hongyi Zhang*, MIT; Sashank Jakkam Reddi, Carnegie Mellon University; Suvrit Sra, MIT

Professor Forcing: A New Algorithm for Training Recurrent Networks Alex Lamb, Montreal; Anirudh Goyal*, University of Montreal; ying Zhang, University of Montreal; Saizheng Zhang, University of Montreal; Aaron Courville, University of Montreal; Yoshua Bengio, U. Montreal

Learning brain regions via large-scale online structured sparse dictionary learning Elvis DOHMATOB*, Inria; Arthur Mensch, inria; Gaël Varoquaux, ; Bertrand Thirion,

Efficient Neural Codes under Metabolic Constraints Zhuo Wang*, University of Pennsylvania; Xue-Xin Wei, University of Pennsylvania; Alan Stocker, ; Dan Lee , University of Pennsylvania

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods Andrej Risteski*, Princeton University; Yuanzhi Li, Princeton University

Efficient High-Order Interaction-Aware Feature Selection Based on Conditional Mutual Information Alexander Shishkin, Yandex; Anastasia Bezzubtseva, Yandex; Alexey Drutsa*, Yandex; Ilia Shishkov, Yandex; Ekaterina Gladkikh, Yandex; Gleb Gusev, Yandex LLC; Pavel Serdyukov, Yandex

Bayesian Intermittent Demand Forecasting for Large Inventories Matthias Seeger*, Amazon; David Salinas, Amazon; Valentin Flunkert, Amazon

Visual Question Answering with Question Representation Update RUIYU LI*, CUHK; Jiaya Jia, CUHK

Learning Parametric Sparse Models for Image Super-Resolution Yongbo Li, Xidian University; Weisheng Dong*, Xidian University; GUANGMING Shi, Xidian University; Xuemei Xie, Xidian University; Xin Li, WVU

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning Jean-Bastien Grill, Inria Lille - Nord Europe; Michal Valko*, Inria Lille - Nord Europe; Remi Munos, Google DeepMind

Asynchronous Parallel Greedy Coordinate Descent Yang You, UC Berkeley; Xiangru Lian, University of Rochester; Cho-Jui Hsieh*, ; Ji Liu, ; Hsiang-Fu Yu, University of Texas at Austin; Inderjit Dhillon, ; James Demmel, UC Berkeley

Iterative Refinement of the Approximate Posterior for Directed Belief Networks Rex Devon Hjelm*, University of New Mexico; Ruslan Salakhutdinov, University of Toronto; Kyunghyun Cho, University of Montreal; Nebojsa Jojic, Microsoft Research; Vince Calhoun, Mind Research Network; Junyoung Chung, University of Montreal

Assortment Optimization Under the Mallows model Antoine Desir*, Columbia University; Vineet Goyal, ; Srikanth Jagabathula, ; Danny Segev,

Disease Trajectory Maps Peter Schulam*, Johns Hopkins University; Raman Arora,

Multistage Campaigning in Social Networks Mehrdad Farajtabar*, Georgia Tech; Xiaojing Ye, Georgia State University; Sahar Harati, Emory University; Le Song, ; Hongyuan Zha, Georgia Institute of Technology

Learning in Games: Robustness of Fast Convergence Dylan Foster, Cornell University; Zhiyuan Li, Tsinghua University; Thodoris Lykouris*, Cornell University; Karthik Sridharan, Cornell University; Eva Tardos, Cornell University

Improving Variational Autoencoders with Inverse Autoregressive Flow Diederik Kingma*, ; Tim Salimans,

Algorithms and matching lower bounds for approximately-convex optimization Andrej Risteski*, Princeton University; Yuanzhi Li, Princeton University

Unified Methods for Exploiting Piecewise Structure in Convex Optimization Tyler Johnson*, University of Washington; Carlos Guestrin,

Kernel Bayesian Inference with Posterior Regularization Yang Song*, Stanford University; Jun Zhu, ; Yong Ren, Tsinghua University

Neural universal discrete denoiser Taesup Moon*, DGIST; Seonwoo Min, ; Byunghan Lee, ; Sungroh Yoon,

Optimal Architectures in a Solvable Model of Deep Networks Jonathan Kadmon*, Hebrew University; Haim Sompolinsky ,

Conditional Image Generation with Pixel CNN Decoders Aaron Van den Oord*, Google Deepmind; Nal Kalchbrenner, ; Lasse Espeholt, ; Koray Kavukcuoglu, Google DeepMind; Oriol Vinyals, ; Alex Graves,

Supervised Learning with Tensor Networks Edwin Stoudenmire*, Univ of California Irvine; David Schwab, Northwestern University

Multi-step learning and underlying structure in statistical models Maia Fraser*, University of Ottawa

Blind Optimal Recovery of Signals Dmitry Ostrovsky*, Univ. Grenoble Alpes; Zaid Harchaoui, NYU, Courant Institute; Anatoli Juditsky, ; Arkadi Nemirovski, Gerogia Institute of Technology

An Architecture for Deep, Hierarchical Generative Models Philip Bachman*,

Feature selection for classification of functional data using recursive maxima hunting José Torrecilla*, Universidad Autónoma de Madrid; Alberto Suarez,

Achieving budget-optimality with adaptive schemes in crowdsourcing Ashish Khetan, University of Illinois Urbana-; Sewoong Oh*,

Near-Optimal Smoothing of Structured Conditional Probability Matrices Moein Falahatgar, UCSD; Mesrob I. Ohannessian*, ; Alon Orlitsky,

Supervised Word Mover's Distance Gao Huang, ; Chuan Guo*, Cornell University; Matt Kusner, ; Yu Sun, ; Fei Sha, University of Southern California; Kilian Weinberger,

Exploiting Tradeoffs for Exact Recovery in Heterogeneous Stochastic Block Models Amin Jalali*, University of Washington; Qiyang Han, University of Washington; Ioana Dumitriu, University of Washington; Maryam Fazel, University of Washington

Full-Capacity Unitary Recurrent Neural Networks Scott Wisdom*, University of Washington; Thomas Powers, ; John Hershey, ; Jonathan LeRoux, ; Les Atlas,

Threshold Bandits, With and Without Censored Feedback Jacob Abernethy, ; Kareem Amin, ; Ruihao Zhu*, Massachusetts Institute of Technology

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks Wenjie Luo*, University of Toronto; Yujia Li, University of Toronto; Raquel Urtasun, ; Rich Zemel,

Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods Lev Bogolubsky, ; Pavel Dvurechensky*, Weierstrass Institute for Appl; Alexander Gasnikov, ; Gleb Gusev, Yandex LLC; Yurii Nesterov, ; Andrey Raigorodskii, ; Aleksey Tikhonov, ; Maksim Zhukovskii,

k^-Nearest Neighbors: From Global to Local Oren Anava, Technion; Kfir Levy, Technion

Normalized Spectral Map Synchronization Yanyao Shen*, UT Austin; Qixing Huang, Toyota Technological Institute at Chicago; Nathan Srebro, ; Sujay Sanghavi,

Beyond Exchangeability: The Chinese Voting Process Moontae Lee*, Cornell University; Seok Hyun Jin, Cornell University; David Mimno, Cornell University

A posteriori error bounds for joint matrix decomposition problems Nicolo Colombo, Univ of Luxembourg; Nikos Vlassis*, Adobe Research

A Bayesian method for reducing bias in neural representational similarity analysis Ming Bo Cai*, Princeton University; Nicolas Schuck, Princeton Neuroscience Institute, Princeton University; Jonathan Pillow, ; Yael Niv,

Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes Chris Junchi Li, Princeton University; Zhaoran Wang*, Princeton University; Han Liu,

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities Ruitong Huang*, University of Alberta; Tor Lattimore, ; András György, ; Csaba Szepesvari, U. Alberta

SDP Relaxation with Randomized Rounding for Energy Disaggregation Kiarash Shaloudegi, ; András György*, ; Csaba Szepesvari, U. Alberta; Wilsun Xu, University of Alberta

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates Yuanzhi Li, Princeton University; Yingyu Liang*, ; Andrej Risteski, Princeton University

Unsupervised Learning of 3D Structure from Images Danilo Jimenez Rezende*, ; S. M. Ali Eslami, Google DeepMind; Shakir Mohamed, Google DeepMind; Peter Battaglia, Google DeepMind; Max Jaderberg, ; Nicolas Heess,

Poisson-Gamma dynamical systems Aaron Schein*, UMass Amherst; Hanna Wallach, Microsoft Research New England; Mingyuan Zhou,

Gaussian Processes for Survival Analysis Tamara Fernandez, Oxford; Nicolas Rivera*, King's College London; Yee-Whye Teh,

Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain Ian En-Hsu Yen*, University of Texas at Austin; huang Xiangru, University of Texas at Austin; Kai Zhong, University of Texas at Austin; Zhang Ruohan, University of Texas at Austin; Pradeep Ravikumar, ; Inderjit Dhillon,

Optimal Binary Classifier Aggregation for General Losses Akshay Balsubramani*, UC San Diego; Yoav Freund,

Disentangling factors of variation in deep representation using adversarial training Michael Mathieu, NYU; Junbo Zhao, NYU; Aditya Ramesh, NYU; Pablo Sprechmann*, ; Yann LeCun, NYU

A primal-dual method for constrained consensus optimization Necdet Aybat*, Penn State University; Erfan Yazdandoost Hamedani, Penn State University

Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing Farshad Lahouti *, Caltech ; Babak Hassibi, Caltech

mphuget / NIPS2016

NIPS2016

About