Zeng, Z., Li, M. and Vannucci, M. (2022+). Bayesian Image-on-Scalar Regression with a Spatial Global-Local Spike-and-Slab Prior. Bayesian Analysis, accepted. arXiv: 2209.08234
Run file sampler.py
to reproduce the results;
data_gen.py
and prior_settings.py
will be called to generate data and load prior settings.
Please see annotation below for more details.
-
The main codes are in the
.py
files-
sampler.py
The main function, when running, it-
loads module (with version for the main modules):
numpy: 1.21.2
scipy: 1.6.1
torch: 1.9.0 + cu102
statsmodels: 0.11.0
pandas: 1.3.3 -
runs
data_gen.py
to generate data -
runs
prior_settings.py
to load prior settings -
defines a few functions for Gibbs sampler
-
does the sampling and saves results under path:
./samplers/*
The folders needs to be manually created.
-
-
data_gen.py
The data generating function, which will be called when runningsampler.py
When running, it:- sets random seeds with the setting,
m
represents the number of images observed
p
represents image-size,$p-by-p$ image
p2
represents the number of pixels
q
represents the number of covariates
S
represents the grid on image - generates parameters from Gaussian Process
s2e
represents the$\sigma^2$ used for the covariance surface
s2b
represents the$\sigma^2$ used for coefficient images - generates data from the settings
- runs MUA on each pixels and saves both generated data and statistics maps.
- sets random seeds with the setting,
-
prior_settings.py
The prior setting function, which will be called when runningsampler.py
When running, it:- obtained a optimized kernel for the prior of Inverse-Wishart process
- set priors and initial values for MCMC algorithms
- add an pad in case of the real data provide invertible variance for IW kernel / variance
-
Note: some warnings may be there due to we manually set the intercept is always selected for all local points,
sampler.py
line 90-93, where$\pi_0 = 1$ is enforced for intercept, leading tolog_theta
on line 76 calculated$log(1-\pi_0)$ and report warning. Meanwhile, this won't cause trouble since we will set$\tau_0(s) = 1$ later.
-
- The jupyter notebook we used to check results for single simulation. The provided one is for the 'good separation plot' we show in the appendix. When doing the main revision, we re-run everything in the 50 repeated simulations, and no longer use this notebook. We attached it considering the recorded plots may offer some intuition about how we report the results
- Codes we used to generate dataset and run the models.
data_gen.py
is used to generate the 50 data sets;
sampler.py
will do what we described above repeatedly for all the set generated. - It takes around 6 mins on our cluster for one single data set and around 800GB to store all results for the 50 data sets, mainly because of the MCMC chains for IWP sample, which is 2000-by-900-by-900 dimension. If you'd like to try it, please use a cluster or something.