LightGBMLSS - An extension of LightGBM to probabilistic forecasting

We propose a new framework of LightGBM that predicts the entire conditional distribution of a univariate response variable. In particular, LightGBMLSS models all moments of a parametric distribution, i.e., mean, location, scale and shape (LSS), instead of the conditional mean only. Choosing from a wide range of continuous, discrete, and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of LightGBM, as it allows to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived.

News

💥 [2022-01-05] LightGBMLSS now supports estimating the full predictive distribution via Expectile Regression.
💥 [2022-01-05] LightGBMLSS now supports automatic derivation of Gradients and Hessians.
💥 [2022-01-04] LightGBMLSS is initialized with suitable starting values to improve convergence of estimation.
💥 [2022-01-04] LightGBMLSS v0.1.0 is released!

Features

✅ Simultaneous updating of all distributional parameters.
✅ Automatic derivation of Gradients and Hessian of all distributional parameters using PyTorch.
✅ Automated hyper-parameter search, including pruning, is done via Optuna.
✅ The output of LightGBMLSS is explained using SHapley Additive exPlanations.
✅ LightGBMLSS is available in Python.

Work in Progress

🚧 Functions that facilitates the choice and evaluation of a candidate distribution (e.g., quantile residual plots, ...).
🚧 Calling LightGBMLSS from R via the reticulate package.
🚧 Estimation of full predictive distribution without relying on a distributional assumption.

Available Distributions

Currently, LightGBMLSS supports the following distributions. More continuous distributions, as well as discrete, mixed discrete-continuous and zero-inflated distributions are to come soon.

Some Notes

Stabilization

Since LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to variability regarding the ranges, the estimation of Gradients and Hessians might become unstable so that LightGBMLSS might not converge or might converge very slowly. To mitigate these effects, we have implemented a stabilization of Gradients and Hessians.

An additional option to improve convergence can be to standardize the (continuous) response variable, e.g., y/100. This is especially useful if the range of the response differs strongly from the range of Gradients and Hessians. Both, the in-built stabilization, and the standardization of the response need to be carefully considered given the data at hand.

Runtime

Since LightGBMLSS updates all distributional parameters simultaneously, it requires training [number of iterations] * [number of distributional parameters] trees. Hence, the runtime of LightGBMLSS is generally slightly higher as compared to LightGBM, which requires training [number of iterations] trees only.

Feedback

Please provide feedback on how to improve LightGBMLSS, or if you request additional distributions to be implemented, by opening a new issue.

Installation

$ pip install git+https://github.com/StatMixedML/LightGBMLSS.git

How to use

We refer to the examples section for example notebooks.

Reference Paper

März, Alexander (2019) "XGBoostLSS - An extension of XGBoost to probabilistic forecasting".

rodrigomverissimo / LightGBMLSS