hrluo / ShardedBayesianAdditiveRegressionTrees

Sharded Bayesian Additive Regression Trees (SBT)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ShardedBayesianAdditiveRegressionTrees

Content This is the code repository for the research publication "Sharded Bayesian Additive Regression Trees" (abbreviated SBT) by Hengrui Luo and Matthew T. Pratola. The manuscript of this paper can be accessed at https://arxiv.org/abs/2306.00361.

  • In experiment folder, we provided a the R and bash code that reproduce the BART and SBT Branin/Friedman synthetic datasetsfor large datasets. The root folder contains the openBT software distribution we used to conduct the experiments, including compilable C++ source code. Please refer README_openBT.md to set up the distribution. Our environment is Ubuntu 22.04.
  • In experiment/redshift folder, we provided the actual redshift dataset we used to test the scalability in our paper.

Abstract In this paper we develop the randomized Sharded Bayesian Additive Regression Trees (SBT) model.

We introduce a randomization auxiliary variable and a sharding tree to decide partitioning of data, and fit each partition component to a sub-model using Bayesian Additive Regression Tree (BART). By observing that the optimal design of a sharding tree can determine optimal sharding for sub-models on a product space, we introduce an intersection tree structure to completely specify both the sharding and modeling using only tree structures. In addition to experiments, we also derive the theoretical optimal weights for minimizing posterior contractions and prove the worst-case complexity of SBT.

Citation We provided R production code for reproducible and experimental purposes under LICENSE. Please cite our paper using following BibTeX item:

@article{luopratola2023shard,
title={Sharded Bayesian Additive Regression Trees},
author={Hengrui Luo and Matthew T. Pratola},
year={2023},
eprint={2306.00361},
archivePrefix={arXiv},
primaryClass={stat.ML}
}

Thank you again for the interest and please reach out if you have further questions.

About

Sharded Bayesian Additive Regression Trees (SBT)

License:MIT License


Languages

Language:Shell 51.9%Language:C++ 33.6%Language:Makefile 6.4%Language:R 5.8%Language:M4 2.2%Language:C 0.1%