GPR works, VGP doesn't

Question

GPR works, VGP doesn't

mccajm opened this issue 7 years ago · comments

Do improve the speed of optimisation, I replaced GPR with VGP as follows:

domain = np.sum([GPflowOpt.domain.ContinuousParameter(f'mux{i}', mm[i], mx[i]) for i in range(7)])
domain += np.sum([GPflowOpt.domain.ContinuousParameter(f'muy{i}', mm[i+7], mx[i+7]) for i in range(7)])
domain += np.sum([GPflowOpt.domain.ContinuousParameter(f'sigmax{i}', 1e-7, 1.) for i in range(7)])
domain += np.sum([GPflowOpt.domain.ContinuousParameter(f'sigmay{i}', 1e-7, 1.) for i in range(7)])
domain += GPflowOpt.domain.ContinuousParameter('offset', endo * 0.7, endo * 1.3)
design = GPflowOpt.design.RandomDesign(500, domain)
X = design.generate()
Y = np.vstack([obj(x.reshape(1, -1)) for x in X])
model = GPflow.vgp.VGP(X, Y, GPflow.kernels.RBF(29, lengthscales=X.std(axis=0)), likelihood=GPflow.likelihoods.Gaussian())
acquisition = GPflowOpt.acquisition.ExpectedImprovement(model)
opt = GPflowOpt.optim.StagedOptimizer([GPflowOpt.optim.MCOptimizer(domain, 500),
GPflowOpt.optim.SciPyOptimizer(domain)])
optimizer = GPflowOpt.BayesianOptimizer(domain, acquisition, optimizer=opt)
optimizer.optimize(obj, n_iter=500)

GPR works, but with VGP I receive the following error:

[[Node: gradients/unnamed._models.model_datascaler.model.build_likelihood/unnamed._models.model_datascaler.model.likelihood.variational_expectations/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/j ob:localhost/replica:0/task:0/gpu:0"](gradients/unnamed._models.model_datascaler.model.build_likelihood/unnamed._models.model_datascaler.model.likelihood.variational_expectations/sub_1_grad/Shape, gradients/unnamed._models.model_datascale r.model.build_likelihood/unnamed._models.model_datascaler.model.likelihood.variational_expectations/sub_1_grad/Shape_1)]] 2017-07-18 23:03:28.798171: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Incompatible shapes: [501,1] vs. [500,1] [[Node: gradients/unnamed._models.model_datascaler.model.build_likelihood/unnamed._models.model_datascaler.model.likelihood.variational_expectations/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/j ob:localhost/replica:0/task:0/gpu:0"](gradients/unnamed._models.model_datascaler.model.build_likelihood/unnamed._models.model_datascaler.model.likelihood.variational_expectations/sub_1_grad/Shape, gradients/unnamed._models.model_datascale r.model.build_likelihood/unnamed._models.model_datascaler.model.likelihood.variational_expectations/sub_1_grad/Shape_1)]] Warning: optimization restart 1/5 failed 2017-07-18 23:03:28.898935: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Incompatible shapes: [500,1] vs. [501,1] [[Node: gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/unnamed._models.model_datas caler.model.build_likelihood/add_1_grad/Shape, gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/Shape_1)]] 2017-07-18 23:03:28.898992: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Incompatible shapes: [500,1] vs. [501,1] [[Node: gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/unnamed._models.model_datas caler.model.build_likelihood/add_1_grad/Shape, gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/Shape_1)]] 2017-07-18 23:03:28.899066: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Incompatible shapes: [500,1] vs. [501,1] [[Node: gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/unnamed._models.model_datas caler.model.build_likelihood/add_1_grad/Shape, gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/Shape_1)]] 2017-07-18 23:03:28.899289: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Incompatible shapes: [500,1] vs. [501,1] [[Node: gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/unnamed._models.model_datas caler.model.build_likelihood/add_1_grad/Shape, gradients/unnamed._models.model_datascaler.model.build_likelihood/add_1_grad/Shape_1)]] Warning: optimization restart 2/5 failed

I'm using master GPflow and GPflowOpt on TensorFlow 1.2 and Python 3.6.

Thanks.

Adam McCarthy commented 7 years ago

Thank you

Joachim van der Herten · Answer 1 · Thu Jul 20 2017 00:50:48 GMT+0800 (China Standard Time)

Hi @mccajm ,

I only checked your problem briefly so far because I'm not in Belgium at the moment, the problem is that some Params in the VGP class (q_mu and q_sqrt) are initialized with the number of data points in X, passed in the constructor. Setting the X/Y dataholders does not update those, so it crashes when a new data point is added.

There's a few ways I can think of to solve this but none of them are very simple, most of them will require some effort.

The hyperparameter callback I suggested in #7 could be used to update those fields. However, it requires a recompile and as I reported in GPflow/GPflow#442 that is currently broken.
In the beginning I was also thinking of the possibility to specify some "accessor" object to sort out X/Y updates, but I'm thinking it might be equivalent to the callback.
The VGP model could be modified such that required fields are changed when X/Y are updated. This still requires fixing the recompilation.

So far, I started a branch to fix the recompilation: it should soon be finished and I'll PR it at the GPflow project. Meanwhile I'll think about what would be the most elegant solution to assure models remain consistent when X/Y are updated, I'm open to input for this. For now, you could use SVGP but it might not gain you a lot of speed:

model = GPflow.svgp.SVGP(X, Y, GPflow.kernels.RBF(29, lengthscales=X.std(axis=0)), Z=X, likelihood=GPflow.likelihoods.Gaussian())

seems to run. However, note that X/Y will be updated but you'll remain with 500 inducing points as these are not updated.
Next week I'm back and will investigate this further.

Joachim van der Herten · Answer 2 · Thu Jul 20 2017 22:14:06 GMT+0800 (China Standard Time)

Hm I overlooked some code in VGP which takes care of the updating of those Param objects which means a solution isn't far off. The problem is that it isn't recompiling due to the issue I referred, so once I finish that fix and it is merged, this problem should be resolved

Joachim van der Herten · Answer 3 · Wed Jul 26 2017 03:59:09 GMT+0800 (China Standard Time)

I started a PR at GPflow to resolve the recompilation issues (GPflow/GPflow#456): once that is finalized and included this should be solved.

If you'd like to try VGP in the meantime, you can try to change the following routine in the Acquisition class (haven't tested this though):

    def _optimize_models(self):
        if self._optimize_restarts == 0:
            return

        for model, hypers in zip(self.models, self._default_params):
            runs = []
            for i in range(self._optimize_restarts):
                model.randomize() if i > 0 else model.set_state(hypers)
                try:
                    model._needs_recompile = True
                    result = model.optimize()
                    runs.append(result)
                except tf.errors.InvalidArgumentError:  # pragma: no cover
                    print("Warning: optimization restart {0}/{1} failed".format(i + 1, self._optimize_restarts))
            best_idx = np.argmin([r.fun for r in runs])
            model.set_state(runs[best_idx].x)

Note this temporary fix forces all models to recompile which might cause performance loss if you apply this with other models.

Joachim van der Herten · Answer 4 · Thu Aug 24 2017 05:25:15 GMT+0800 (China Standard Time)

This should be fixed by #72 although this isn't a permanent fix yet. For now though, VGP should be usable.