Statistical Computation Benchmarks

Benchmark Models for Evaluating Algorithm Accuracy

This repository includes a very preliminary suite of models, specified as Stan programs, for evaluating the accuracy of Bayesian statistical computation algorithms.

Recall that the only role of a proper Bayesian computational algorithm is to estimate expectations with respect to a given posterior distribution. The means of each parameter, transformed paramamter, and generated quantity in each benchmark model has been computed to high accuracy for evaluating new algorithms. A given estimate, mu_est, passes only if (mu_est - mu_true) / sigma_true < 0.25, in other words if the algorithm can reasonably locate the mean to the center of the posterior. No validation of posterior variances is considered outside of those explicitly defined in the generated quantities block of some models.

Each Stan program in the benchmark, and any necessary data, are located in the benchmarks folder. Each Stan program is accompanied with a <model_name>.info file describing the model.

To test a new algorithm create the folder empirical_results/<algorithm_name> and add to it a text file called <model_name>.fit for each model you would like to evaluate. Each <model_name>.fit itself should consist of two columns separated with a space -- the first for the parameter name and the second for the estimated expectation value. Change line 5 of the shell script evaluate to algo=<algorithm_name> and run from the command line,

> ./ evaluate

The shell script will compare each parameter for each model included, listing the results as it goes as well as summary information at the end.

stan-dev / stat_comp_benchmarks

Statistical Computation Benchmarks

About

Languages