zehsilva / prior-predictive-specification

Techniques for setting of hyperparameters of hierarchical models using the prior predictive distribution of the model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning hyperparameters of Bayesian models by matching moments of prior predictive distribution.

Eliezer de Souza da Silva, Tomasz Kuśmierczyk, Marcelo Hartmann, Arto Klami; Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching. Journal of Machine Learning Research. 24(67):1−51, 2023.

@article{JMLR:v24:21-0623,
  author  = {Eliezer de Souza da Silva and Tomasz Kuśmierczyk and Marcelo Hartmann and Arto Klami},
  title   = {Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {67},
  pages   = {1--51},
  url     = {http://jmlr.org/papers/v24/21-0623.html}
}

Pre-installation Requirements

The code was tested using Python 3.7.4 from Anaconda 2019.10. with TensorFlow 2.1 and TensorFlow Probability 0.9.0. It uses numpy, pandas, seaborn, and matplotlib.

Data

The hetrec-lastfm dataset along with train-test split can be found in the directory named data.

Code illustrating gradient-based optimization

The code illustrating gradient-based optimization can be found in the directory named gradient_optimization.

Main files

  • pmf_sgd_optimization.ipynb - Jupter Notebook illustrating how priors matching requested values of prior predictive expectation and/or variance can be found for Poisson Matrix Factorization (PMF) model using SGD.
  • hpf_sgd_optimization.ipynb - Jupter Notebook illustrating how priors matching requested values of prior predictive expectation and/or variance can be found for Hierarchical Poisson Matrix Factorization (HPF) model using SGD.
  • pmf_estimators_analysis.ipynb - Jupter Notebook illustrating bias and variance of the estimators used in pmf_sgd_optimization.ipynb for PMF model.
  • pmf_surface_visualizations.ipynb - Jupter Notebook illustrating 1D and 2D projections of optimization space for the problem of matching Poisson Matrix Factorization (PMF) prior predicitve distribution variance (minimization of the discrepancy=(Variance-100)^2 ). We consider two parametrizations: abcd vs mu-sgima.

Sampling code

  • pmf_model.py - Methods calculating E[Y] and E[Y^2] (and therefore also Var[Y]) over prior predictive distribution for Poisson Matrix Factorization.
  • hpf_model.py - Methods calculating E[Y] and E[Y^2] (and therefore also Var[Y]) over prior predictive distribution for Hierarchical Poisson Matrix Factorization.

Additional files

  • aux.py, aux_plt.py, boplotting/* - Auxiliary functions for tensor processing and plotting.

Experiments illustrating convergance of gradient-based optimization

The code can be found in gradient_optimization_experiments. It contains two subfolders PMF_Convergence and HPF_Convergence.

Visualization of PMF posterior quality using PSIS-LOO

The code computing PSIS-LOO on the test subset of fitted PMF can be found in posterior_visualization. The scripts pmf_precompute_objectives_posterior.py and python pmf_precompute_objectives_posterior2.py precompute certain set of configurations specified inside those files and write to respectively pmf_precompute_objectives_posterior.py.csv and pmf_precompute_objectives_posterior2.py.csv. The outputs can be then previewed with VISUALIZATION.ipynb and VISUALIZATION2.ipynb. VISUALIZATION_K.ipynb plots PSIS-LOO on test subset for various K with a,b,c,d set to prior optimal values.

Comparison of Bayesian optimization of PSIS-LOO

The code can be found in bo_optimization. To run the experiment use: RUN_EXPERIMENT_BO.sh. It requires RoBO - a Robust Bayesian Optimization framework (https://github.com/automl/RoBO) to be preinstalled. Results can be displayed using the Jupter Notebook VISUALIZATION_BO.ipynb.

Sensitivity to Model Mismatch

The code can be found in sensitivity analysis folder. To visualize the experiment results open the jupyter-notebook sensitivity_analysis.ipynb. To re-run the experiment run the python scripts - python poisson_prior_exp_negbin_async.py: experiment sampling from a Negative Binomial - python poisson_prior_exp_binomial_async.py: experiment sampling from a PMF but with a probability of randomly zeroing each of the entries of the matrix. Both experiments will generate csv files with the results. The files with a suffix final_*.csv can be analyzed in the sensitivity_analysis.ipynb notebook simply by adding new cells, keeping the same code from previous cells and just adjusting the file name that is loaded.

About

Techniques for setting of hyperparameters of hierarchical models using the prior predictive distribution of the model

License:MIT License


Languages

Language:Jupyter Notebook 88.4%Language:Python 11.5%Language:Shell 0.1%