resibots / limbo

A lightweight framework for Gaussian processes and Bayesian optimization of black-box functions (C++11)

Home Page:http://www.resibots.eu/limbo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

computation of sigma in GP

bossdm opened this issue · comments

There seems to be an additional noise term, first added independently and then subtracted within the k^tK^{-1}k term. Apparently limbo is using the procedure mentioned in an older arXiv paper https://arxiv.org/pdf/1407.3501v3.pdf while the procedure mentioned in the newer version https://arxiv.org/pdf/1407.3501.pdf does not have this. V3 also seems to add the noise to the prior.

See l.159-167 https://github.com/resibots/limbo/blob/master/src/limbo/model/gp.hpp
for the single noise term, added independently to sigma.

If you look at _sigma (l.620) there it computes the k^tK^{-1}k term and points to _matrixL, which itself is computed with noise in case i==j ( see l.81-84 https://github.com/resibots/limbo/blob/master/src/limbo/kernel/kernel.hpp).

Is this a bug or not? Would the algorithm be adversely affected when the noise is kept in K but the independent addition of noise to sigma is removed?

Would the algorithm be adversely affected when the noise is kept in K but the independent addition of noise to sigma is removed?

There should be no real difference to IT&E or any algorithm using GPs even if the noise is kept or not. The impact is really small because (1) usually the noise is not optimized and set to a small value, and (2) even when optimized, the algorithm converges to small noises usually so the impact should be super small. In the term computed for the cholesky, the noise is important because otherwise the cholesky is unstable.

Is this a bug or not?

To be honest, I do not remember why we are adding the noise one more time. Given the math (see the book by Rasmussen), we should not have it. So I would say that this is a bug, but I need to dig up on why we did this (maybe it's accidental).

Just as a thought, if the noise is set too high (higher than the data warrants but we don't always know that), could the subtraction result in negative values of sigma? Hence an addition may avoid that?

In the term computed for the cholesky, the noise is important because otherwise the cholesky is unstable.

is the noise just a numerical trick for cholesky decomposition or does it have a deeper meaning as the noise/variability on the observation?

I also see on l.83 of https://github.com/resibots/limbo/blob/master/src/limbo/kernel/kernel.hpp
that an additional small number is added. perhaps this small number is added in addition to the noise, perhaps this is to avoid the unstability in case there is no user-defined noise?

on p.19 of the book it says "The algorithm returns the predictive mean and variance for noisefree test data—to compute the predictive distribution for noisy test data y∗,simply add the noise variance σ2n to the predictive variance of f∗." Perhaps when we assume the data are noisy we should add this term.

The algorithm returns the predictive mean and variance for noisefree test data—to compute the predictive distribution for noisy test data y∗,simply add the noise variance σ2n to the predictive variance of f∗.

Then this is why I am adding this value. So this is not a bug. It's normal behavior! Thanks for finding this.

Yes, some books consider that as soon as you compute k(x,x) it should be k(x,x)+noise, both when you compute k(x,x) in the K matrix and when you compute k(x,x) independently in sigma. This is why we use it.
However, when using UCB it does not make a difference at all, as we are just adding a constant in a maximisation task.

I see, thanks for the quick responses on this.

Closing this. @bossdm in case you have any other issue/question, do not hesitate to ping. Thanks for using our library!