Add discrete variables

Question

Add discrete variables

gpfins opened this issue 7 years ago · comments

A next (important imho) step would be to support ordinal variables. It would be useful to have a discussion here on the design of this.

A few issues would be:

The introduction of OrdinalParameter for ordinal discrete values. There would be two types of ordinal parameters, namely one with only bounds a, b ({a, a+1, ..., b-1}) and one with specific (ordered) discrete variables ({a, b, c, d}). Do we treat those the same and leave it up to the user to define the first case as range(a,b) ?
The optimization of the acquisition function. In the case of pure discrete values it would probably be the easiest (and reasonably fast) to simply evaluate all possible combinations. In the case of a domain consisting of both continuous and discrete parameters: we could apply the naive approach of optimizing the acquisition function as function of the continuous parameters for every combination of the discrete parameters. In both cases this doesn't use any gradient information for the discrete parameters. One idea could be for example to treat the acquisition function as if its inputs were all continuous, but add penalizers in order not to sample at the same discrete variable 2 times.

Joachim van der Herten · Answer 1 · Mon Aug 14 2017 05:04:58 GMT+0800 (China Standard Time)

I would go for the same, for convenience we can derive OrdinalRangeParameter from the standard OrdinalRange to make it a bit more convenient. In both cases, the resulting domain is a grid anyway.
After defining new parameters, I'd focus on representing the domains accurately. Sums of discrete parameters create grids, continuous parameters hypercubes, and combines they become something joint with both a discrete as well as a continuous domain. Then we have to make the optimizers a little picky so they reject domains they don't like. The penalization approach sounds nice but might be a lot harder to achieve design-wise and I am not sure we'd gain that much. The acquisition is cheap-to-evaluate.

From this I derive following requirements:

Being able to split up a domain again by selecting a number of parameters and gaining the properties of a more specific domain again (e.g., all continuous -> hypercube)
Quickly obtain a wrapper for the objective function which disables parameters which weren't selected and plugs in the values for those parameters. This could also be achieved by introducing a .fixed syntax like in GPflow.

icouckuy · Answer 2 · Tue Aug 15 2017 03:13:50 GMT+0800 (China Standard Time)

I also would go for one OrdinalRangeParameter to hold both a range(7) or arbitrary list of integers. Maybe we should call it IntegerParameter?

Being able to split up a domain again by selecting a number of parameters and gaining the properties of > a more specific domain again (e.g., all continuous -> hypercube)

With the new domain indexing it is possible to easily extract a new domain only holding a subset of the parameters. At the moment there is no way to automatically choose only a certain type of params as well as the new domain will be a generic domain class and not a 'Hypercube'.

Joachim van der Herten · Answer 3 · Tue Aug 15 2017 03:18:51 GMT+0800 (China Standard Time)

I'd like to comply with the naming in http://proceedings.mlr.press/v70/valera17a/valera17a.pdf

Joachim van der Herten · Answer 4 · Sun Aug 20 2017 06:47:22 GMT+0800 (China Standard Time)

We haven given this some thought so far. With regards to the optimization of a discrete domain there are currently two approaches we can follow:

Evaluate the discrete part on a (sub)grid and start a gradient-descent on the continuous part of the domain. This option has the disadvantage of being potentially computationally extremely demanding. e.g., optimizing 3 layers of a neural network between 20 and 200 neurons will result in serious optimization effort. We could only evaluate on a sub-grid but I'm in doubt that solution will scale appropriately.
Round off continuous values (used in spearmint) and complement with https://arxiv.org/abs/1706.03673 . This approach would also allow to keep using gradients and traditional optimizers. There's some concern on selecting the same point multiple times though.