Materials from StanCon

StanCon’s version of conference proceedings is a collection of contributed talks based on interactive notebooks. Every submission is peer reviewed by at least two reviewers. The reviewers are members of the Stan Conference Organizing Committee and the Stan Developmemt Team. This repository contains all of the accepted notebooks as well as any supplementary materials required for building the notebooks. The slides presented at the conference are also included.

License: unless otherwise noted, the text in this repository is distributed under the CC BY 4.0 License and code is distributed under the New BSD License. Copyright to the authors.

StanCon 2017 | January 21, Columbia University, New York

2017 Peer reviewed contributed talks

Jonathan Auerbach, Rob Trangucci: Twelve Cities: Does lowering speed limits save pedestrian lives?
- Presenter: Jonathan Auerbach is a PhD candidate in the Department of Statistics at Columbia University.
- Notebook and materials
- Slides
- Video
Milad Kharratzadeh: Hierarchical Bayesian Modeling of the English Premier League
- Presenter: Milad Kharratzadeh is a Postdoctoral Research Scientist in Statistics at Columbia University working with Andrew Gelman. He is jointly appointed at the Earth Institute and the Data Science Institute. His main focus is on developing new statistical methods and using hierarchical Bayesian models for analyzing social, environmental, and health data.
- Notebook and materials
- Slides
- Video
Victor Lei, Nathan Sanders, Abigail Dawson: Advertising Attribution Modeling in the Movie Industry
- Presenter: Victor Lei is a data scientist with the Applied Analytics group at Legendary Entertainment. He has a varied background, with experience in law, computer science and finance.
- Notebook and materials
- Slides
- Video
Woo-Young Ahn, Nate Haines, Lei Zhang: hBayesDM: Hierarchical Bayesian modeling of decision-making tasks
- Presenter: Woo-Young (Young) Ahn is an assistant professor in the Department of Psychology and Translational Data Analytics at the Ohio State University. He earned his Ph.D. in clinical science from Indiana University, Bloomington, S.M. in applied physics from Harvard University, and B.S. in materials science and engineering from Seoul National University.
- Notebook and materials
- Slides
- Video
Charles Margossian, Bill Gillespie: Differential Equation Based Models in Stan
- Presenter: Charles Margossian is a visiting scientist at Metrum Research Group, a biomedical lab that specializes in modeling and simulation. He joined Stan's development team this September to work on tools for differential equation based models.
- Notebook and materials
- Slides
- Video
Teddy Groves: How to Test IRT Models Using Simulated Data
- Presenter: Teddy Groves completed his PhD in inductive logic and philosophy of statistics at Kent University and now works for Football Radar, a football statistics company based in London. His interests include politics, Rudolf Carnap’s writing about probability and applying statistical methods to football.
- Notebook and materials
- Slides
- Video
Bruno Nicenboim, Shravan Vasishth: Models of Retrieval in Sentence Comprehension
- Presenter: Bruno Nicenboim is a PhD candidate at the Department of Linguistics of the University of Potsdam, Germany. His research focus is on cognitive models that link memory processes with sentence comprehension, taking into account individual differences. He currently teaches graduate and undergraduate level courses at the University of Potsdam.
- Notebook and materials
- Slides
- Video
Rob Trangucci: Hierarchical Gaussian Processes in Stan
- Presenter: Rob Trangucci is a statistician in NYC working with the Stan team.
- Notebook and materials
- Slides
- Video
Nathan Sanders, Victor Lei: Modeling the Rate of Public Mass Shootings with Gaussian Processes
- Presenter: Nathan Sanders is the Senior Director of Quantitative Analytics at Legendary Entertainment and has used Stan to model systems in astronomy, film, environmental policy, public health, and more.
- Notebook and materials
- Slides
- Video

StanCon 2018 | January 10-12, Asilomar, California

2018 Peer reviewed contributed talks

Does the New York City Police Department rely on quotas?

Authors: Jonathan Auerbach (Columbia University)

This submission investigates whether the New York City Police Department (NYPD) uses productivity targets or quotas to manage officers in contravention of New York State Law. The analysis is presented in three parts. First, the NYPD's employee evaluation system is introduced, and the criticism that it constitutes a quota is summarized. Secondly, a publically available dataset of traffic tickets issued by NYPD officers in 2014 and 2015 is described. Finally, a generative model to describe how officers write traffic tickets is proposed. The fitted model is consistent with the criticism that police officers substantially alter their ticket writing to coincide with departmental targets. The submission concludes by discussing the implication of these findings and offering directions for further research.

Links:

Diagnosing Alzheimer’s the Bayesian way

Authors: Arya A. Pourzanjani, Benjamin B. Bales, Linda R. Petzold, Michael Harrington (UC Santa Barbara)

Alzheimer's Disease is one the most debilitating diseases, but how do we diagnose it accurately? Researchers have been trying to answer this question by building generative models to describe how patient biomarkers, such as MRI scans, psychological tests, and lab tests relate over time to the underlying brain deterioration that's present in Alzheimer's Disease. In this notebook we show how we translated these models to the Bayesian framework in Stan and how this allowed for several model improvements that can ultimately improve our understanding of Alzheimer's and help physicians in diagnosis. In particular, we describe how we hierarchically model patient disease trajectories to obtain stable estimates for patients who lack data. We describe how fitting in Stan yields uncertainties on these disease trajectories, and why that is important for weighing the pros and cons of risky treatment. Lastly, we describe a new method for Bayesian modeling of these monotonic disease trajectories in Stan using I-Splines.

Links:

Joint longitudinal and time-to-event models via Stan

Authors: Sam Brilleman, Michael Crowther, Margarita Moreno-Betancur, Jacqueline Buros Novik, Rory Wolfe (Monash University, Columbia University)

The joint modelling of longitudinal and time-to-event data has received much attention in the biostatistical literature in recent years. In this notebook (and talk), we describe the implementation of a shared parameter joint model for longitudinal and time-to-event data in Stan. The methods described in the notebook are a simplified version of those underpinning the stan_jm modeling function that has recently been contributed to the rstanarm R package.

Links:

A tutorial on Hidden Markov Models using Stan

Authors: Luis Damiano, Brian Peterson, Michael Weylandt

We implement a standard Hidden Markov Model (HMM) and the Input-Output Hidden Markov Model for unsupervised learning of time series dynamics in Stan. We begin by reviewing three commonly-used algorithms for inference and parameter estimation, as well as a number of computational techniques and modeling strategies that make full Bayesian inference practical. For both models, we demonstrate the effectiveness of our proposed approach in simulations. Finally, we give an example of embedding a HMM within a larger model using an example from the econometrics literature.

Links:

Student Ornstein-Uhlenbeck models served three ways (with applications for population dynamics data)

Authors: Aaron Goodman (Stanford University)

Ornstein-Uhlenbeck (OU) processes are a mean reverting process and is used to model dynamics in biology, physics, and finance. I fit an extension of the OU process that is driven by a Lévy process with Student's t-marginals rather than Brownian motion with Gaussian marginals, which allows for heavy-tailed increments. I implement four formulations of the Student-t OU-type model in Stan and compare the sampling performance on both real and simulated population dynamic data.

Links:

Video (coming soon)
Notebook, code, slides
github.com/aaronjg/outype_t_process_stan
web.stanford.edu/~aaronjg

SlicStan: a blockless Stan-like language

Authors: Maria I. Gorinova, Andrew D. Gordon, Charles Sutton (University of Edinburgh)

We present SlicStan — a probabilistic programming language that compiles to Stan and uses static analysis techniques to allow for more abstract and flexible models. SlicStan is novel in two ways: (1) it allows variable declarations and statements to be automatically shredded into different components needed for efficient Hamiltonian Monte Carlo inference, and (2) it introduces more flexible user-defined functions that allow for new model parameters to be declared as local variables. This work demonstrates that efficient automatic inference can be the result of the machine learning and programming languages communities joint efforts.

Links:

Introducing idealstan, an R package for ideal point modeling with Stan

Authors: Robert Kubinec (University of Virginia)

Item-response theory (IRT) ideal-point scaling/dimension reduction methods that incorporate additional response categories and missing/censored values, including absences and abstentions, for roll call voting data (or any other kind of binary or ordinal item-response theory data). Full and approximate Bayesian inference is done via Stan.

Links:

Computing steady states with Stan’s nonlinear algebraic solver

Authors: Charles C. Margossian (Metrum, Columbia University)

Stan’s numerical algebraic solver can be used to solve systems of nonlinear algebraic equations with no closed form solutions. One of its key applications in scientific and engineering fields is the computation of equilibrium states (equivalently steady states). This case study illustrates the use of the algebraic solver by applying it to a problem in pharmacometrics. In particular, I show the algebraic system we solve can be quite complex and embed, for instance, numerical solutions to ordinary differential equations. The code in R and Stan are provided, and a Bayesian model is fitted to simulated data.

Links:

Bayesian estimation of mechanical elastic constants

Authors: Ben Bales, Brent Goodlet, Tresa Pollock, Linda Petzold (UC Santa Barbara)

This outlines a Bayesian approach to resonance ultrasound spectroscopy (RUS), a technique for estimating elastic constants of a material from a sample's measured resonance modes. The notebook includes an example of how to take advantage of custom automatic differentiation in specialized Stan models (either for numerical or efficiency reasons).

Links:

Aggregate random coefficients logit — a generative approach

Authors: Jim Savage (Lendable Marketplace), Shoshana Vasserman (Harvard University).

This notebook illustrates how to fit aggregate random coefficient logit models in Stan, using Bayesian techniques. It’s far easier to learn and implement than the standard BLP algorithm, and has the benefits of being robust to mismeasurement of market shares, and giving limited-sample posterior uncertainty of all parameters (and demand shocks). This comes at the cost of modeling firms’ price-setting process, including how unobserved product-market demand shocks affect prices.

Links:

The threshold test: Testing for racial bias in vehicle searches by police

Authors: Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, Emma Pierson (Stanford University)

We develop a new statistical test to detect bias in decision making — the threshold test—that mitigates the problem of infra-marginality by jointly estimating decision thresholds and risk distributions.

Links:

Assessing the safety of Rosiglitazone for the treatment of type II diabetes

Authors: Konstantinos Vamvourellis, K. Kalogeropoulos, L. Phillips (London School of Economics and Political Science)

A Bayesian paradigm for making drug approval decisions. Case study in the treatment of Diabetes (Type 2).

Links:

Causal inference with the g-formula in Stan

Authors: Leah Comment (Harvard University)

The potential outcomes framework often uses one or more parametric outcome models to learn about underlying causal processes. In Stan, parameter estimation using observed data takes place in the model block, while simulation-based estimation of causal parameters using the g-formula can be done separately with generated quantities. Bayesian estimation allows for data-driven sensitivity analysis regarding the assumption of no unmeasured confounding. This presentation shows some simple causal models, then outlines a basic sensitivity analysis using prior information derived from an external data source.

Links:

Bayesian estimation of ETAS models with Rstan

Authors: Fausto Fabian Crespo Fernandez (Universidad San Francisco de Quito)

Earthquake modeling with Stan. Applied to seismic recurrence in Ecuador in 2016.

Links:

2018 Invited talks

Predictive information criteria in hierarchical Bayesian models for clustered data

Presenters: Sophia Rabe-Hesketh, Daniel Furr (UC Berkeley)
Video
Slides and code
gse.berkeley.edu/people/sophia-rabe-hesketh, github.com/danielcfurr

ScalaStan

Presenter: Joe Wingbermuehle (Cibo Technologies)
Video
Slides
github.com/cibotech/ScalaStan

Stan applications in physics: Testing quantum mechanics and modeling neutrino masses

Presenter: Talia Weiss (MIT)
Slides
https://www.linkedin.com/in/talia-weiss-184753139

Forecasting at scale: How and why we developed Prophet for forecasting at Facebook

Presenters: Sean Taylor, Ben Letham (Facebook)
Video
research.fb.com/facebook-at-stancon-2018
facebook.github.io/prophet

Stan applications in human genetics: Prioritizing genetic mutations that protect individuals from human disease

Presenter: Manuel Rivas (Stanford University)
Video
Slides
med.stanford.edu/rivaslab

Statistics using geometry to show uncertainties and integrate graph information

Presenter: Susan Holmes (Stanford University)
Video
Slides
statweb.stanford.edu/~susan

A brief history of Stan

Presenter: Daniel Lee (Generable)
Video
Slides
github.com/syclik

Model assessment, model selection and inference after model selection

Presenter: Aki Vehtari (Aalto University)
Video
Notebook, code, slides
users.aalto.fi/~ave/

Spatial models in Stan: intrinsic auto-regressive models for areal data

Presenter: Mitzi Morris (Columbia University)
Video
Slides
Case study
github.com/mitzimorris

Some problems I'd like to solve in Stan, and what we'll need to do to get there

Presenter: Andrew Gelman (Columbia University)
Video

Magellen / stancon_talks

Materials from StanCon

Contents:

StanCon 2017 | January 21, Columbia University, New York

2017 Peer reviewed contributed talks

StanCon 2018 | January 10-12, Asilomar, California

2018 Peer reviewed contributed talks

2018 Invited talks

About

Languages