stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

paid work- optimise NGBoost

Geethen opened this issue · comments

The company I work for is looking for someone to reduce the amount of time NGBoost takes to train and run inference.
We will be willing to pay someone to refactor the algorithm (rates are negotiable). All work done will be made openly available. depending on the speedups achieved, a research paper could stem from your work (if that is something you are interested in).

Deliverable: A version of NGBoost that runs close to the speeds of LightGBM (or faster) by leveraging numba/C++/GPU acceleration or any other relevant acceleration approaches (maybe incorporating LightGBM techniques).

Link to the company website:

If interested and you have the time available, please email me to discuss options, timelines and payment: gsingh@naturalstate.org

@Geethen You might be interested in using LightGBMLSS which is an extension of LightGBM to probabilistic forecasting.

Also, if all you want are prediction intervals and you don't need the full conditional density you should use conformal inference instead: https://cdsamii.github.io/cds-demos/conformal/conformal-tutorial.html.

@StatMixedML Thank you very much- this seems very useful. I will give it a test drive.

@alejandroschuler - I was interested in NGBoost because it has been significantly outperforming LightGBM, XGBoost, etc, on the regression problems I am working on. Nevertheless, thank you for your suggestion.

Also, if all you want are prediction intervals and you don't need the full conditional density you should use conformal inference instead: https://cdsamii.github.io/cds-demos/conformal/conformal-tutorial.html.

Hey @alejandroschuler ! What do you mean by the full conditional density?

@StatMixedML Thank you very much- this seems very useful. I will give it a test drive.

@alejandroschuler - I was interested in NGBoost because it has been significantly outperforming LightGBM, XGBoost, etc, on the regression problems I am working on. Nevertheless, thank you for your suggestion.

That seems suspicious... If you are just doing standard regression point prediction of the response (i.e. you don't care about the distribution of $Y|X$) then ngboost should pretty much never outperform other boosting algorithms. Match them, sure. Beat them now and again by a small margin, maybe. But significantly outperform... pretty much never.

Also, if all you want are prediction intervals and you don't need the full conditional density you should use conformal inference instead: https://cdsamii.github.io/cds-demos/conformal/conformal-tutorial.html.

Hey @alejandroschuler ! What do you mean by the full conditional density?

The data are assumed to be draws from some unknown distribution $X_i, Y_i \overset{IID}{\sim} \mathcal D$. Therefore at each value of the covariates $X=x$ you can define the conditional distribution of $Y|X=x$. This is a (continuous) random variable so it has a density as a function of $y$. NGBoost estimates this conditional density $p_{Y|X}(y,x)$. From a trained model, you can evaluate that with the code np.exp(ngb.predict(x).logpdf(y)).

The conditional density fully describes the conditional distribution so you can build a prediction interval from it (i.e. for each $x$ of interest, find a region of values $y \in \mathcal Y$ such that $P\{Y \in \mathcal Y | X=x\} = 0.95$).

Conformal Predictive Distributions estimate full conditional density one can find tutorials and more materials here https://github.com/valeman/awesome-conformal-prediction

https://proceedings.mlr.press/v91/vovk18a.html

https://www.youtube.com/watch?v=FUi5jklGvvo&t=3s

Crepes library in Python based on Conformal Predictive Distributions builds the complete CDF for each test object whilst providing guarantees that the density is well calibrated and valid including being located in the right place. https://github.com/henrikbostrom/crepes