Cattes / XGBoostLSS

An extension of XGBoost to probabilistic forecasting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

XGBoostLSS - An extension of XGBoost to probabilistic forecasting

We propose a new framework of XGBoost that predicts the entire conditional distribution of univariate and multivariate responses. In particular, XGBoostLSS models all moments of a parametric distribution, i.e., mean, location, scale and shape (LSS), instead of the conditional mean only. Choosing from a wide range of continuous, discrete, and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of XGBoost, as it allows to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived.


Supported Python Versions: 3.8, 3.9 GitHub issues GitHub pull requests GitHub forks GitHub stars GitHub contributers Last commit License

Installation

pip install xgboostlss

Or from github: To ensure a proper installation of XGBoostLSS, it is crucial to follow the correct installation order from below and avoid installing it in a directory or conda/venv environment that already contains "xgboost/xgboostlss" or any other name related to XGBoost. This precaution is necessary as the current dependency, https://github.com/dsgibbons/shap.git, may not disable cuda building in its setup() call, resulting in potential installation issues.

# Please first use 
pip install git+https://github.com/dsgibbons/shap.git

# Now install XGBoostLSS 
pip install git+https://github.com/StatMixedML/XGBoostLSS.git

How to use

We refer to the examples section for example notebooks.

News

πŸ’₯ [2022-10-14] XGBoostLSS now supports multi-target regression [code coming soon].
πŸ’₯ [2022-01-03] XGBoostLSS now supports estimation of the Gamma distribution.
πŸ’₯ [2021-12-22] XGBoostLSS now supports estimating the full predictive distribution via Expectile Regression.
πŸ’₯ [2021-12-20] XGBoostLSS is initialized with suitable starting values to improve convergence of estimation.
πŸ’₯ [2021-12-04] XGBoostLSS now supports automatic derivation of Gradients and Hessians.
πŸ’₯ [2021-12-02] XGBoostLSS now supports pruning during hyperparameter optimization.
πŸ’₯ [2021-11-14] XGBoostLSS v0.1.0 is released!

Features

βœ… Simultaneous estimation of all distributional parameters.
βœ… Multi-target regression allows modelling of multivariate responses and their dependencies.
βœ… Automatic derivation of Gradients and Hessian of all distributional parameters using PyTorch.
βœ… Automated hyper-parameter search, including pruning, is done via Optuna.
βœ… The output of XGBoostLSS is explained using SHapley Additive exPlanations.
βœ… XGBoostLSS is available in Python.

News

πŸ’₯ [2023-05-18] Release of v0.2.0. See the release notes for an overview.
πŸ’₯ [2022-10-14] XGBoostLSS now supports multi-target regression. (Currently available via Py-BoostLSS).
πŸ’₯ [2022-01-03] XGBoostLSS now supports estimation of the Gamma distribution.
πŸ’₯ [2021-12-22] XGBoostLSS now supports estimating the full predictive distribution via Expectile Regression.
πŸ’₯ [2021-12-20] XGBoostLSS is initialized with suitable starting values to improve convergence of estimation.
πŸ’₯ [2021-12-04] XGBoostLSS now supports automatic derivation of Gradients and Hessians.
πŸ’₯ [2021-12-02] XGBoostLSS now supports pruning during hyperparameter optimization.
πŸ’₯ [2021-11-14] XGBoostLSS v0.1.0 is released!

How to use

We refer to the example section for example notebooks.

Available Distributions

XGBoostLSS currently supports the following PyTorch distributions.

Distribution Usage Type Support Number of Parameters
Beta Beta() Continous
(Univariate)
$y \in (0, 1)$ 2
Expectile Expectile() Continous
(Univariate)
$y \in (-\infty,\infty)$ Number of expectiles
Gamma Gamma() Continous
(Univariate)
$y \in (0, \infty)$ 2
Gaussian Gaussian() Continous
(Univariate)
$y \in (-\infty,\infty)$ 2
Gumbel Gumbel() Continous
(Univariate)
$y \in (-\infty,\infty)$ 2
Laplace Laplace() Continous
(Univariate)
$y \in (-\infty,\infty)$ 2
Negative Binomial NegativeBinomial() Discrete Count
(Univariate)
$y \in (0, 1, 2, 3, ...)$ 2
Poisson Poisson() Discrete Count
(Univariate)
$y \in (0, 1, 2, 3, ...)$ 1
Student-T StudentT() Continous
(Univariate)
$y \in (-\infty,\infty)$ 3
Weibull Weibull() Continous
(Univariate)
$y \in [0, \infty)$ 2

Some Notes

Stabilization

Since XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to variability regarding the ranges, the estimation of Gradients and Hessians might become unstable so that XGBoostLSS might not converge or might converge very slowly. To mitigate these effects, we have implemented a stabilization of Gradients and Hessians.

For improved convergence, an alternative approach is to standardize the (continuous) response variable, such as dividing it by 100 (e.g., y/100). This approach proves especially valuable when the response range significantly differs from that of Gradients and Hessians. Nevertheless, it is essential to carefully evaluate and apply both the built-in stabilization and response standardization techniques in consideration of the specific dataset at hand.

Runtime

Since XGBoostLSS updates all distributional parameters simultaneously, it requires training [number of iterations] * [number of distributional parameters] trees. Hence, the runtime of XGBoostLSS is generally slightly higher as compared to XGBoost, which requires training [number of iterations] trees only.

Work in Progress

🚧 Functions that facilitates the choice and evaluation of a candidate distribution (e.g., quantile residual plots, ...).
🚧 Estimation of full predictive distribution without relying on a distributional assumption.

Feedback

We encourage you to provide feedback on how to enhance XGBoostLSS or request the implementation of additional distributions by opening a new issue.

Reference Paper

MΓ€rz, Alexander (2022): Multi-Target XGBoostLSS Regression.
MΓ€rz, A. and Kneib, T.: (2022) Distributional Gradient Boosting Machines.
MΓ€rz, Alexander (2019): XGBoostLSS - An extension of XGBoost to probabilistic forecasting.

Local development

Poetry is used for virtual env management.

For local development clone the respository, install Poetry and run

poetry install

Check if everything worked with

poetry run pytest -v

After adding a new feature, don't forget to increase the version number using bump2version.

poetry run bump2version patch

Packaging and publishing to pypi

poetry build

Publishing to pypi is automated using a Github Action

The following steps are required:

  • Update the version number in the setup.py file and lss_xgboost/__init__.py.
  • Pushes to the master will trigger a release to Test PyPI.
  • Creating a Tagged release trigger a release to PypI.

About

An extension of XGBoost to probabilistic forecasting

License:Apache License 2.0


Languages

Language:Python 100.0%