Throughout the last decade, the practical advancements and the theoretical understanding of deep learning (DL) models and practices has arguably reached a level of maturity such that it is the preferred choice for any practitioner seeking simple yet powerful solutions to solve machine learning-related problems. With this tutorial we aim to expose the participants to novel trends in DL for scenarios where quantification of uncertainty matters and we will discuss new and emerging trends in the Bayesian deep learning community.

Description of the tutorial

Decision making processes are ubiquitous in social sciences and engineering and a sound modeling of uncertainty is key to build reliable and trustworthy systems. Throughout the last decade, the practical advancements and the theoretical understanding of deep learning models and practices has arguably reached a level of maturity such that it is the preferred choice for any practitioner seeking simple yet powerful solutions to solve machine learning problems.

The dissemination of DL could raise questions on how much we blindly rely on these model's predictions, especially when accuracy is not the only important performance metric and when having sensible uncertainty quantification is a strict system requirement. With this tutorial we aim to expose the participants to novel trends in DL for scenarios where quantification of uncertainty matters. We will extensively discuss how a proper probabilistic treatment of such complex deep models is possible and feasible. We will also highlight new and emerging trends in the Bayesian deep learning community, and we will discuss some important computational aspects.

Overview of the content

The tutorial will last about 3h30m and will be divided into three main parts.

Part 1. Motivation for Bayesian inference in modern AI systems

The first part will be dedicated to motivation for a probabilistic treatment in systems powered by deep learing models. Following, we will show some fundamental results from Bayesian theory, upon which we will build the content of the next part.

Introduction of the speakers and summary of the tutorial
The need of reliable models
Limitations of loss-trained deep neural networks and the motivation for a probabilistic modeling for calibration of uncertainty, detection of out-of-distribution data and robustness to adversarial examples
Bayes' Theorem and the concept of likelihood and prior/posterior distributions

Part 2. Bayesian neural networks: inference and modern trends

The second part will be entirely dedicated to the core of the tutorial: we will present some methodological results that allow us to do tractable Bayesian inference on deep neural networks , namely variational inference, Markov-Chain Monte Carlo methods, and other approximations.

Optimization as a way to perform inference on Bayesian neural networks (BNNs): an introduction to variational inference
- Monte-Carlo Dropout: the simplest way to have BNNs
- Formalization of the variational objective (and its gradients)
- Parameterization of variational inference and recent advancements
Sampling from intractable distributions with MCMC
- Introduction to Hamiltonian Monte Carlo (HMC)
- Scaling HMC for Bayesian deep learning with stochastic gradients
Ensembles and other approximations
- Ensemble as a way to perform Bayesian inference on neural networks
- Ensemble as a special case of variational inference
- Bayesian model averaging on DNN for scalable inference
- Laplace approximation
Neural networks are approximation of Gaussian processes: some lessons that can be learn

Part 3. Practical considerations and conclusions

Finally, the last part will be dedicated to some practical considerations (e.g. how to choose priors).

A problem for today is a solution for tomorrow: encoding prior knowledge for Bayesian DNN
Calibration of the uncertainty estimation for BNNs
Final remarks and take-away message

Material

Introduction -- Variational Inference -- Sampling with MCMC methods -- Laplace approximation and Ensembles -- Priors and practical considerations -- Conclusions

Recordings

Potential target audience

The audience targeted by this tutorial is represented by practitioners and scientists willing or interested in using deel learning for systems where sound uncertainty quantification is a requirement. We will assume that the participants are comfortable with some DL basics, and some concepts of optimization (like mini-batch learning and back-propagation). A bit of experience with Bayesian inference is suggested but not required to successfully follow the tutorial, as we will dedicate a good part of the introduction to make sure everyone is on-par with some basic probability theory results before diving into the core content of this tutorial.

Motivation and objectives

Combined with the availability of open source libraries like Tensorflow and PyTorch, deep learning has quickly gained attraction in other communities, from cosmology and experimental physics to neuroscience , and it has cross-fertilized other computer science fields, such as digital hardware design, data management systems and materials science . Disconcertingly, näive implementations of DL models are found to be unreliable in some scenarios. A recent analysis of deep CNNs for classification, for example, showed that the predictions are systematically over-confident. In practice, this means that there is not a clear way to check whether the model is "sure" or not about a certain predictions and, as a consequence, taking informed decisions based on the output of such models should be carefully considered and properly assessed to avoid misinterpreting the model behavior. This is an interesting problem from a methodological research point of view but it is also a concerning aspect for any possible deployment of DL-based systems, for which a model is usually trained just once and could be interrogated with any kind of input data.

A Bayesian approach to deep learning has shown promising results when it comes to accurate quantification of uncertainty, without compromising on performance. The objective of this tutorial is to present a selection these methodological advancements for applying Bayesian inference techniques to deep learning models.

Presenters

Simone Rossi has been a PhD candidate under the supervision of Prof. Maurizio Filippone at EURECOM since 2018. He holds a MSc in Computer Engineering from ENST Telecom Paris (France) and a MSc in Electronic Engineering from Politecnico di Torino (Italy). His main research has been focused on novel methods for applying Bayesian inference to deep models (including Gaussian processes and deep Gaussian processes), with approximate variational inference techniques and Monte-Carlo methods.

Maurizio Filippone has been an Associate Professor at EURECOM since 2015. Prior to that, he carried out some postdoctoral experience in probabilistic machine learning in the UK (Sheffield, Glasgow and UCL) and became Assistant Professor at the University of Glasgow, UK in 2011. Since 2011, he has been teaching classes in probabilistic machine learning and artificial intelligence at postgraduate level. His research interests are in the development of practical and scalable methods for Bayesian inference and for Gaussian processes and deep Gaussian processes. In the last few years, he has received a prestigious 7-year fellowship from the AXA Research Fund and a 3-year research grant from the Agence Nationale de la Recherche to develop novel probabilistic-based approaches to advance risk modeling in life and environmental sciences.

References

Introduction to Variational Inference methods

   Jordan et al. (1999). *An Introduction to Variational Methodsfor Graphical Models*. Mach. Learn.

   Hoffman et al. (2013). *Stochastic Variational Inference*. JMLR

   Ranganath et al. (2014). *Black Box Variational Inference*. AISTATS

   Blei et al. (2017). *Variational Inference: A Review for Statisticians*. JASA

Monte-Carlo Dropout for Bayesian Neural Networks and follow-up

   Srivastava et al. (2014). *Dropout: A Simple Way to Prevent Neural Networks from Overfitting*, JMLR

   Kingma et al. (2015). *Variational Dropout and the Local Reparameterization Trick*. NeurIPS

   Gal (2016). *Uncertainty in Deep Learning*. University of Cambridge (PhD Thesis)

   Gal and Ghahramani (2016). *Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference*. ICLR Workshop

   Gal and Ghahramani (2016). *Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning*. ICML

   Kendall and Gal (2017). *What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?*. NeurIPS

   Li and Gal (2017). *Dropout Inference in Bayesian Neural Networks with Alpha-divergences*. ICML

   Hron et al. (2017). *Variational Gaussian Dropout is not Bayesian*. NeurIPS Workshop

   Hron et al (2018). *Variational Bayesian Dropout: Pitfalls and Fixes*. ICML

Variational Inference for Bayesian Neural Networks

   Graves (2011). *Practical Variational Inference for Neural Networks*. NeurIPS

Rezende et al. (2014). *Stochastic Backpropagation and Approximate Inference in Deep Generative Models*. ICML

   Hernández-Lobato et al. (2015). *Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks*. ICML

   Blundell et al. (2015). *Weight Uncertainty in Neural Networks*. ICML

Rezende et al. (2015).*Variational Inference with Normalizing Flows*. ICML

   Louizos and Welling (2016). *Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors*. ICML

Kingma et al. (2016). *Improving Variational Inference with Inverse Autoregressive Flow*.  NeurIPS

Liu et al. (2016).    *Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm*.  NeurIPS

Miller et al. (2016). *Variational Boosting: Iteratively Refining Posterior Approximations*.  ICML

   Louizos and Welling (2017). *Multiplicative Normalizing Flows for Variational Bayesian Neural Networks*. ICML

   Sun et al. (2017). *Learning Structured Weight Uncertainty in Bayesian Neural Networks*. AISTATS

   Khan et al. (2018). *Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in ADAM*. ICML

   Rossi et al. (2018). *Good Initializations of Variational Bayes for Deep Models*. ICML

   Zhang et al. (2018). *Noisy Natural Gradient as Variational Inference*. ICML

   Ghosh et al. (2018). *Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors*. ICML

   Osawa et al. (2019). *Practical Deep Learning with Bayesian Principles*. NeurIPS

   Sun et al. (2019). *Functional Variational Bayesian Neural Networks*. ICLR

   Farquhar et al. (2020). *Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations*. NeurIPS

   Rossi et al. (2020). *Walsh-Hadamard Variational Inference for Bayesian Deep Learning*. NeurIPS

Daxberger et al. (2021). *Bayesian Deep Learning via Subnetwork Inference*. ICML

Sampling of Bayesian neural network posterior

MacKay (1992). *A Practical Bayesian Framework for Backpropagation Networks*. Neural computation.

Neal (1996). *Bayesian Learning for Neural Networks*. Springer

Neal (2011). *MCMC using Hamiltonian Dynamics*. Hand-book of Markov Chain Monte Carlo

Ahn et al. (2012). *Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring*. ICML

Chen et al. (2014). *Stochastic gradient Hamiltonian Monte Carlo*. ICML

Betancourt (2015). *The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling*. ICML

Chen et al. (2015). *On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators*. NeurIPS

Springenberg et al. (2016). *Bayesian Optimization with Robust Bayesian Neural Networks*. NeurIPS

Mandt et al. (2017). *Stochastic Gradient Descent as Approximate Bayesian Inference*. JMLR

Zhang et al. (2020). *Amagold: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC*. AISTATS

Zhang et al. (2020). *Cyclical stochastic gradient MCMC for Bayesian deep learning*. ICLR

Cobb et al. (2021). *Scaling Hamiltonian Monte Carlo Inference for Bayesian Neural Networks with Symmetric Splitting*. UAI

Franzese et al. (2021). *A Unified View of Stochastic Hamiltonian Sampling*. arXiv

Izmailov et al. (2021). *What Are Bayesian Neural Network Posteriors Really Like?* ICML

Laplace approximation

   MacKay (1991). *Bayesian Model Comparison and Backprop Nets*. NeurIPS

   MacKay (1991). *A Practical Bayesian Framework for Backpropagation Networks*. Neural comput.

   Williams and Barber (1998). *Bayesian classification with Gaussian processes*. IEEE PAMI

   MacKay (1998). *Choice of Basis for Laplace Approximation*. Machine Learning

   Schraudolph (2002). *Fast curvature matrix-vector products for second-order gradient descent*. Neural Comput.

   Kuss and Rasmussen (2005). *Assessing Approximate Inference for Binary Gaussian Process Classification*. JMLR

   Nickisch and Rasmussen (2008). *Approximations for Binary Gaussian Process Classification*. JMLR

   Martens et al. (2015). *Optimizing Neural Networks with Kronecker-factored Approximate Curvature*. ICML

   Botev et al. (2017). *Practical Gauss-Newton Optimisation for Deep Learning*. ICML

   Ritter et al. (2018). *A Scalable Laplace Approximation for Neural Networks*. ICLR

   Kunstner et al. (2019). *Limitations of the Empirical Fisher Approximation for Natural Gradient Descent*. NeurIPS

   Dangel et al. (2020). *BackPACK: Packing more into Backprop*. ICLR

   Immer et al. (2021). *Improving predictions of Bayesian neural nets via local linearization*. AISTATS

   Immer et al. (2021). *Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning*. ICML

   Kristiadi et al. (2021). *Learnable Uncertainty under Laplace Approximations*. UAI

Ensemble methods

   Newton and Raftery (1994). *Approximate Bayesian Inference with the Weighted Likelihood Bootstrap*. JRSS - Series B

   Lakshminarayanan et al. (2017). *Simple and scalable predictive uncertainty estimation using deep ensembles*. NeurIPS

   Pearce et al. (2018). *Bayesian Inference with Anchored Ensembles of Neural Networks, and Application to Reinforcement Learning*. ICML Workshop

   Pearce et al. (2018). *Bayesian neural network ensembles*. NeurIPS Workshop

   Garipov et al. (2018). *Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs*. NeurIPS

   Fort et al. (2019). *Deep Ensembles: A Loss Landscape Perspective*. NeurIPS BDL Workshop

   Milios et al. (2020). *Parametric Bootstrap Ensembles as Variational Inference*. AABI

   He at al. (2020). *Bayesian Deep Ensembles via the Neural Tangent Kernel*. NeurIPS

Infinite-limit Neural Networks

   Rasmussen and Williams (2006). *Gaussian Processes for Machine Learning*, MIT Press

   Damianou and Lawrence (2013). *Deep Gaussian Processes*. AISTATS

   Cutajar et al. (2017). *Random Features Expansions for Deep Gaussian Processes*. ICML

   Jacot et al. (2018). *Neural Tangent Kernel: Convergence and Generalization in Neural Networks*. NeurIPS

   Matthews et al. (2018). *Gaussian Process Behaviour in Wide Deep Neural Networks*. ICLR

   Lee et al. (2018). *Deep Neural Networks as Gaussian Processes*. ICLR

   Novak et al. (2019). *Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes*. ICLR

   Garriga-Alonso et al. (2019). *Deep Convolutional Networks as shallow Gaussian Processes*. ICLR

   Yang (2019). *Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes*. NeurIPS

   Khan et al. (2019). *Approximate Inference Turns Deep Networks into Gaussian Processes*. NeurIPS

   Lee et al. (2019). *Wide Neural Networks of Any Depth Evolve as Linear Models under Gradient Descent*. NeurIPS

srossi93 / bdl-tutorial