Acquisition Function stability

Question

Acquisition Function stability

mathDR opened this issue 7 years ago · comments

This is a general question regarding both Probability of Improvement (PI) and Expected Improvement (EI) Acquisition Functions:

How does one recover viable values for PI and EI when the candidate_var value is vanishingly small?

I note that this can usually only happen with very smooth functions and an RBF kernel, but the Normal CDF is zero for small candidate_variance, which yields zero PI and zero EI (since the Normal PDF is scaled by the candidate standard deviation).

Perhaps, this isn't an issue since most functions that are optimized using Bayesian Optimization are not smooth? Perhaps candidate points aren't close enough together?

I was wondering/hoping someone has seen this phenomenon and has a good solution for it.

Joachim van der Herten · Answer 1 · Mon Nov 20 2017 06:04:30 GMT+0800 (China Standard Time)

Hi @mathDR

Like you mention, low variance numbers occur near existing data points: this is what typically causes the exploration behavior in Bayesian optimization. Usually, there is always some other underexplored region which is more attractive (due to higher variance). From a sampling point of view, being near a previous evaluation isn't interesting anyway if there are large gaps between data points. If the process continues, the expected variance over the domain decreases and at some point these smaller values near data points actually cause scores which are higher compared to other places: sampling will move closer to previously evaluated data points (more in-depth exploitation). So in general, this isn't an issue: the lower variance cause exploration until it makes sense to further refine an optimum because there is a low probability there is a better option available.

At least, thats the mathematical story. Numerically this process converges towards instability at some point. This is why we enforce a lower bound of 1e-6 on the variances in the code.

ps: note that for a perfectly deterministic function causing zero likelihood variance the variance is 0 in the data points. This implies PI and EI are undefined in the data points themselves because of a division by zero.

Dan Marthaler · Answer 2 · Tue Nov 21 2017 00:55:19 GMT+0800 (China Standard Time)

Thanks for the comment @javdrher This was exactly my issue: the numerical instability in the CDF and PDF for the acquisition functions. I like the hard lower bound of 1e-6 for variance, this will solve most issues.

I'd guess as an extension, one could augment both PI and EI for the zero sigma case (and minimization of a function) as:

PI(x)  = CDF(ybest,mu(x),sigma^2(x)) for sigma(x) > 0
       = 1 if sigma(x) = 0 and mu(x) < ybest
       = 0 if sigma(x) = 0 and mu(x) >= ybest

and

EI(x) = (ybest-mu(x))* CDF(ybest,mu(x),sigma^2(x)) + sigma(x)*PDF(ybest,mu(x),sigma^2(x)) for sigma(x) > 0
       = (ybest-mu(x)) if sigma(x) = 0 and mu(x) < ybest
       = 0             if sigma(x) = 0 and mu(x) >= ybest

but with the hard lower bound for variance, this would never be needed...

icouckuy · Answer 3 · Tue Nov 21 2017 01:12:39 GMT+0800 (China Standard Time)

This is more or less how we did it in our old code, except to check for sigma(x) < jitter and as we had likelihood.variance = 0 we did not have separate cases depending on ybest.

In general I think the current approach does the same thing while making the code much cleaner.