Alternative step-size heuristic

Question

Alternative step-size heuristic

Red-Portal opened this issue 5 years ago · comments

Hi, sorry for not following the issue template.
I think the current step-size heuristic (brent's method if I understand?) works well generally,
but not for some ill-posed likelihoods that are unavoidable.
Specifically, I'm trying to sample hyperparameters from Gaussian processes.
The problem is, because of the positive-definite requirement of Gaussian processes,
numerical errors are really easy to stumble upon.
In those cases, the most simple treatment is to return a -Inf likelihood.
Because of this, the likelihood becomes undifferentiable in a lot of points resulting in Brent's method's initial steps fail miserably.
I think adding the original heuristic from [1] should be considered for these kinds of cases?

[1] Hoffman, Matthew D., and Andrew Gelman. "The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo." Journal of Machine Learning Research 15.1 (2014)

Tamas K. Papp · Answer 1 · Sat Nov 16 2019 21:32:39 GMT+0800 (China Standard Time)

I don't think would makes a difference — your problem is most likely with the initial point being infeasible (and returning -Inf). The heuristic makes a small difference when likelihoods are finite, but neither this nor the original NUTS heuristic can cope with a -Inf starting point.

The solution is to start the sampler with a given initial position q, eg

mcmc_with_warmup(rng, ℓ, N; initialization = (q = q, ))

Note that q is a vector of real numbers.

Generally, the best solution is to transform so that the model is defined for all parameters. Eg PSD matrices can be obtained from a Cholesky factor.

Kyurae Kim · Answer 2 · Sat Nov 16 2019 21:46:38 GMT+0800 (China Standard Time)

The problem is that it is not apparent which initial point is well defined (at least for my case). Also, the Cholesky is performed using the set of parameters. So the infeasible points are when Cholesky fails for the specific set of parameters. I don't think there is really a way around it. After the dual-averaging steps, everything works just fine though.

My point is, I would like to use an alternative heuristic (or bypass it) while being able to use the dual-averaging step.

Tamas K. Papp · Answer 3 · Sat Nov 16 2019 22:11:57 GMT+0800 (China Standard Time)

You should be able to bypass it, eg with

mcmc_with_warmup(rng, ℓ, N;
                 warmup_stages = default_warmup_stages(; stepsize_search = nothing))

That said, my suggestion was not to perform a Cholesky decomposition, but to parametrize in terms of one (transformed into the unconstrained coordinates).

If you have an MWE you can post here, I may be able to provide more specific help.

Kyurae Kim · Answer 4 · Sun Nov 17 2019 14:18:32 GMT+0800 (China Standard Time)

I actually tried to feed nothing to stepsize_search but that didn't work.
Please see the error below

[ Info: finding initial optimum
ERROR: MethodError: no method matching isless(::Int64, ::Nothing)
Closest candidates are:
  isless(::Missing, ::Any) at missing.jl:66
  isless(::PyCall.PyObject, ::Any) at /home/msca8h/.julia/packages/PyCall/ttONZ/src/pyoperators.jl:75
  isless(::Real, ::AbstractFloat) at operators.jl:157
  ...
Stacktrace:
 [1] macro expansion at /home/msca8h/.julia/packages/ArgCheck/xX4DA/src/checks.jl:162 [inlined]
 [2] initial_adaptation_state(::AutoBO.DynamicHMC.DualAveraging{Float64}, ::Nothing) at /home/msca8h/.julia/DynamicHMC/.../src/stepsize.jl:237

Also, I didn't quite get what you meant with the Cholesky.
Can you provide a link or something to some relevant examples?

Tamas K. Papp · Answer 5 · Sun Nov 17 2019 16:14:34 GMT+0800 (China Standard Time)

Well, if you skip the stepsize search, you have to provide one manually. I added an example in #104, see

https://github.com/tpapp/DynamicHMC.jl/pull/104/files#diff-93cb0535c382976d57ac524722ce6344R565

The Stan manual (eg Stan User’s Guide, section 2.13. Multivariate Priors for Hierarchical Models, subsection Optimization through Cholesky Factorization, search for cholesky_factor_corr) describes the transformation technique in detail.

(I am not sure if you realize this, but it is difficult to help without an MWE I can try to run).

Kyurae Kim · Answer 6 · Mon Nov 25 2019 21:58:03 GMT+0800 (China Standard Time)

Sorry, the code I'm currently working is difficult to refine into a MWE.
I would really like to show you one but I don't think it'll have the time right now.
Since you kindly addressed the original issue anyway, I'll close this.
Thanks for the help!