Constrained optimization

Question

Constrained optimization

icouckuy opened this issue 7 years ago · comments

This issue keeps track of the support for constraints in GPFlowOpt.

Support for expensive constraints will be initially added using the Probability of Feasbility (PoF). With the acquisition function defined as gamma(x) = EI * PoF.

Paying attention to:

fmin should be based only on the feasible data (similarly for noisy objectives)
however, if the initial data set does not contain feasible points, we should have a way to efficiently obtain at least one feasible point before continuing. For instance, by optimizing solely the PoF and afterwards switching to gamma.

Other constraints, which can be cheaply evaluated, are passed through to the optimizer. Although I'm not sure about the support (and performance) of constraints in scipy. I believe the TF optimizers do not have direct support for constraints, of course adding a penalty to the objective function (or loss) is always possible.

@javdrher we make a distinction between equality and inequality constraints? Equality constraints might not work well with the PoF, if at all. At first sight, expected violation is an option, but I'll have to look in literature what the standard approaches are again. It is not needed for version 0.1.0.

Joachim van der Herten · Answer 1 · Mon May 08 2017 02:08:05 GMT+0800 (China Standard Time)

The first point is an issue indeed. That is currently wrong.

I disagree we should automatically detect no feasible points and switch to PoF only. I'd leave that out of the library. Its possible to first use a BayesianOptimizer with PoF, then run one with gamma. It would be interesting though to support some sort of stopping criteria. It would be helpful in this scenario.

I specifically only included scipy optimization instead of TF optimizers. I don't see a specific reason to work penalty functions. Although I am open to be convinced otherwise. For white-box equality constraints can be avoided by transforming the domain first. PoF works quite bad with equality constraints so better options are welcome there.

yangvz · Answer 2 · Fri May 12 2017 17:24:47 GMT+0800 (China Standard Time)

It might be a dumb question, but how is tensorflow's gradient tensors passed to scipy for optimization purposes?

Nicolas K · Answer 3 · Fri May 12 2017 17:36:29 GMT+0800 (China Standard Time)

https://github.com/GPflow/GPflow/blob/master/GPflow/model.py
The graph to calculate the function and gradients is calculated in the _compile function and an objective function in generated.which returns the function and gradients.

In _optimize_np the objective function is used in result = minimize(..) with jac=True the jacobian is assumed to come with the objective function.

yangvz · Answer 4 · Fri May 12 2017 18:09:02 GMT+0800 (China Standard Time)

@nknudde Thank you, I will take a more careful look into GPflow.

yangvz · Answer 5 · Fri May 12 2017 18:52:45 GMT+0800 (China Standard Time)

Ah, I see. It's because evaluate() is decorated with AutoFlow, so it's automatically compiled.

Joachim van der Herten · Answer 6 · Fri May 12 2017 18:58:33 GMT+0800 (China Standard Time)

Not sure if you were referring to the SciPy optimization of the model likelihood (as described by @nknudde ) or the optimization of the acquisition function which applies a similar principle. The actual construction of the gradient of the acquisition function occurs in _build_acquisition_wrapper (called by evaluate and evaluate_with_gradients) . However, I added a mechanism in the actual optimizers to discard the gradients in case of gradient-free optimization (such as the MCOptimizer).

Thank you for pointing this out, its something I should mention in the documentation.

icouckuy · Answer 7 · Sat May 13 2017 16:25:06 GMT+0800 (China Standard Time)

I disagree we should automatically detect no feasible points and switch to PoF only. I'd leave that out of the library. Its possible to first use a BayesianOptimizer with PoF, then run one with gamma. It would be interesting though to support some sort of stopping criteria. It would be helpful in this scenario.

Makes sense. I think it should be well-documented that in BayesianOptimizer the Acquisition function (and underlying GPFlow model) is updated with new data, so the user knows he can use his own pointer to the acquisition function in a next sampling stage (possible with a new hybrid acquisition function).

I specifically only included scipy optimization instead of TF optimizers. I don't see a specific reason to work penalty functions.

Agree, lets just rely on the scipy optimizers.

Adam McCarthy · Answer 8 · Mon Jun 05 2017 07:29:27 GMT+0800 (China Standard Time)

Would you mind providing a small example of inequality constraints please as they stand currently?

My domain is invariant to position, e.g. [1, 2, 3] and [2, 3, 1] are the same solution. Is there a better way to model this than a > b and b > c in BO?

yangvz · Answer 9 · Mon Jun 05 2017 12:15:22 GMT+0800 (China Standard Time)

The problem is that unlike GP hyperparameters, real world input points are highly unregulatory in their settings. Examples are:

like in @mccajm's example, inputs that are invariant to their positions.
complex equality constraints, such that x1^2 + x2^2 = 1
simple high-dimensional equality constraints, such as weights etc. x1+x2+..+x(i) = 1, which is best modeled by optimizing over x1..x(i-1) while substituting x(i) = 1-sum(x1..x(i-1)).
.. and so on.

I think it would be better to separate the package into a lower level and a higher level, so that it's possible for advanced users to bypass Domain (in the higher level) and specify constraints and optimization methods explicitly (in the lower level).

I've created a gist to illustrate my point:
https://gist.github.com/yangnw/241ecf33f16a5212b3a433e631ab2117
This is certainly a premature design that uses mixins but it illustrates the point.

In particular, the high-level functionalities (etc. can be used even if the low-level functionalities are extended (etc. KB). This is somehow similar to TensorFlow's design.

Joachim van der Herten · Answer 10 · Mon Jun 05 2017 15:01:43 GMT+0800 (China Standard Time)

Hi @mccajm , thank you for your interest in GPFlowOpt. With regards to constraints, there are two distinct use cases:

black-box constraints which are part of the expensive evaluation and must be learned. Some support for this is currently present (e.g. PoF)
white-box constraints which restrict the domain and can be included directly by reducing the search over the optimization domain. Currently no work was done on this so far, as @yangnw correctly points out this involves a lot of consideration and can quickly turn into a mess. Some support is on the roadmap, most likely for a second release (currently we are working towards a first).

Regarding your position invariant domain, your solution would definitely be valid. Although no simple way exists yet, you can probably already achieve this by implementing your own optimizer (i.e., based on the SciPyOptimizer) which you then use to optimize the acquisition function in BayesianOptimizer. Different solution: I guess your domain is symmetric so you probably can also come up with a transform to a domain of lower dimensionality which covers all your function values (i.e. figure 1 in https://arxiv.org/pdf/1310.6740.pdf which is related to your problem, but in a black-box setting). You can then optimize on this reduced domain.

Some thoughts on the implementation of white-box constraints in GPflowOpt:

For basic constraints I'd like to add a constraint object which can be evaluated (perhaps also its derivatives?) and can then be added to a domain. Optimizers pick them up and, for instance, configure scipy.minimize, perform rejection sampling etc.
I agree that we should not implement everything at high level and keep support for expert users to operate at a lower level. In fact, the lower level support is more or less available. I'd focus on some higher-level use cases and add a tutorial notebook to the docs to illustrate use of the lower level. By keeping interfaces clean, adding your own optimizer (with integrated constraint support) should remain simple.
As you may have noticed, some support was added for domain transforms (#12). In case of an equality constraint we could add functionality which generates a transform between a d dimensional domain + equality constraint and a d-1 dimensional domain. By wrapping the objective function with this transform the BO can run on the reduced domain.

icouckuy · Answer 11 · Tue Aug 15 2017 03:23:50 GMT+0800 (China Standard Time)

For expensive constraints we can take a look at Gelbart's dissertation section 2.6:
https://dash.harvard.edu/bitstream/handle/1/17467236/GELBART-DISSERTATION-2015.pdf?sequence=1

It summarizes a number of proposed constrained acquisitions functions over the years. At the moment we only have implemented the probability of feasibility.

Joachim van der Herten · Answer 12 · Tue Aug 15 2017 03:25:46 GMT+0800 (China Standard Time)

Recall that we do not intend to support a wide range of methods, I'd only include others if there is a specific reason for it.

icouckuy · Answer 13 · Tue Aug 15 2017 03:30:33 GMT+0800 (China Standard Time)

@yangnw The current plan is to store the (cheap) constraints in the Domain object. This can include coefficient matrices (for whitebox linear constraints) or function handles (for cheap blackbox constraints).

The Domain object is passed to an Optimizer and so the constraints are available there. I think this is still low-level enough. Of course for really specific constraints (e.g., symmetry) a custom Optimizer class must be implemented.