eugeniovaretti / PM10_BAYESIAN

Clustering spatial time series via Bayesian nonparametrics. Bayesian Statistic, MSc Mathematical Engineering, PoliMi, a.y 2022/2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clustering spatial time series via Bayesian nonparametric

This projects contain (almost) all the work done for the course of Bayesian Statistic a.y 2022/2023 for the MSc. Mathematical Engineering, Politecnico di Milano.

Installation

For end users

The repository contains as module the bayesmix library, a C++ library for running MCMC simulations in Bayesian mixture models.

Prerequisites: to build bayesmix you will need git, pkg-config and a recent version of cmake.

On Linux machines, it is sufficient to run

 sudo apt-get -y update && apt-get install -y
 sudo apt-get -y install git
 sudo apt-get -y install python3-pip
 sudo python3 -m pip install cmake
 sudo apt-get install -yq pkg-config

On macOS, after install HomeBrew, replace sudo apt-get -y with brew.

To install and use the repository, please 'cd' to the folder you wish to install it, and clone it through the following command-line instructions:

git clone --recursive https://github.com/eugeniovaretti/PM10_BAYESIAN

or

git clone --recursive git@github.com:eugeniovaretti/PM10_BAYESIAN

In the latter case you have to register your ssh keys on a github account.

How to build bayesmix

You need to set up bayesmix to run properly the code.
To build the executable for the main file run_mcmc.cc, please use the following list of commands:

git submodule update
cd bayesmix
mkdir build
cd build
cmake .. -DDISABLE_TESTS=ON
make run_mcmc
cd ..
cd ..

Reproducibility

This section is intended for any user who wants to run the analysis to reproduce the same results, or for any user who wants to analyze results with different hyperparameter values (in particular, the code is optimized and automated to test grids of totalmass (totalmass) and distance (a) values), or for those who want to apply the same model to their own data.
The repository is structured as follow:

  • bayesmix : contains the submodule that performs the MCMC simulations.
  • input_data : contains all the input data (time series and covariates for the model).
  • output_plot : empty folder useful to collect results when the main.Rmd and the algorithm are runned.
  • python_implementation : contains the vanilla python implementation of the model. It is useful to better (and more easily) understand the algorithm and the model implementation. In addition, it is useful for comparing the performance of the same algorithm implemented in C++ (much faster).
  • utils : contains the utilities developed for the main script.
  • main.Rmd : notebook that serves as a comprehensive guide for preparing data and interpreting the output from the MCMC algorithm. The script guides you through the process of data preparation, up to the MCMC Algorithm section, where you are prompted to run the C++ code using the run.sh file. The final Result section guides you through the interpretation of the results, ensuring a seamless and effective analysis.
  • run.sh : bash script to facilitate the execution of the c++ algorithm. It uses the files produced by the first sections of main.Rmd. One can specify the two parameters a and M as arguments. The default values are a=250 and M=0.567.

Authors

Tutor: Matteo Gianella (@TeoGiane)

Only for "interested" users

To download bayesmix 's updates type

cd bayesmix
git pull origin master
cd ..

If there are updates, after having verified that your working tree is clean ('git status' to check) do:

git add bayesmix
git commit -m "Downloaded bayesmix updates"
git push

About

Clustering spatial time series via Bayesian nonparametrics. Bayesian Statistic, MSc Mathematical Engineering, PoliMi, a.y 2022/2023


Languages

Language:TeX 63.5%Language:Jupyter Notebook 13.8%Language:R 13.8%Language:Python 8.5%Language:Shell 0.4%