ValueError: Logp function returned error Initialization of first point failed

Question

ValueError: Logp function returned error Initialization of first point failed

ricardoV94 opened this issue a year ago · comments

import nutpie
import pymc as pm

with pm.Model() as m:
    proposed_pay = pm.MutableData("proposed_pay", np.array([50_000, 200_000], dtype="float64"))
    accepted = pm.MutableData("accepted", np.array([0, 1], dtype="int64"))
    
    mean = pm.Normal("mu", mu=100_000, sigma=25_000)
    std = pm.Gamma("std", mu=25_000, sigma=5_000)
    
    p_accept = 1 - pm.logcdf(pm.Normal.dist(mean, std), proposed_pay).exp()
    p_accept = pm.Deterministic("p_accept", p_accept)
    
    llike = pm.Bernoulli("llike", p=p_accept, observed=accepted)
    
cm = nutpie.compile_pymc_model(m)
nutpie.sample(cm)

I get these weird warnings from pytensor/numba

/tmp/tmps3yx0tdg:1: NumbaWarning: Cannot cache compiled function "numba_funcified_fgraph" as it uses dynamic globals (such as ctypes pointers and large global arrays)
  def numba_funcified_fgraph(scalar_variable, scalar_variable_1, scalar_variable_7, scalar_variable_11, scalar_variable_3, scalar_variable_5, scalar_variable_15, scalar_variable_19):
site-packages/nutpie/compile_pymc.py:362: NumbaWarning: Cannot cache compiled function "numba_funcified_fgraph" as it uses dynamic globals (such as ctypes pointers and large global arrays)
  return inner(x, *_shared_tuple)

And then a ValueError

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [43], in <cell line: 17>()
     14     llike = pm.Bernoulli("llike", p=p_accept, observed=accepted)
     16 cm = nutpie.compile_pymc_model(m)
---> 17 nutpie.sample(cm)

site-packages/nutpie/sample.py:101, in sample(compiled_model, draws, tune, chains, seed, num_try_init, save_warmup, store_divergences, progress_bar, init_mean, store_unconstrained, **kwargs)
     98 if init_mean is None:
     99     init_mean = np.zeros(compiled_model.n_dim)
--> 101 sampler = lib.PyParallelSampler(
    102     compiled_model.logp_func_maker,
    103     init_mean,
    104     settings,
    105     n_chains=chains,
    106     n_draws=draws,
    107     seed=seed,
    108     n_try_init=num_try_init,
    109 )
    111 expand_draw = compiled_model.expand_draw_fn
    113 def do_sample():

ValueError: Logp function returned error Initialization of first point failed

Adrian Seyboldt · Answer 1 · Sat May 06 2023 08:13:47 GMT+0800 (China Standard Time)

I think this is a consequence of the initial point choice in nutpie. It currently tries to initialize all parameters in the transformed space in (-1, 1). But in that range we just can't find any valid points. The problem goes away if we change the scaling of the variables (ie we measure everything in multiples of 10_000):

with pm.Model() as m:
    proposed_pay = pm.MutableData("proposed_pay", np.array([5.0, 20.0], dtype="float64"))
    accepted = pm.MutableData("accepted", np.array([0, 1], dtype="int64"))

    mean = pm.Normal("mu", mu=10.0, sigma=2.5)
    std = pm.Gamma("std", mu=2.5, sigma=.5)

    p_accept = 1 - pm.logcdf(pm.Normal.dist(mean, std), proposed_pay).exp()
    p_accept = pm.Deterministic("p_accept", p_accept)

    llike = pm.Bernoulli("llike", p=p_accept, observed=accepted)

cm = nutpie.compile_pymc_model(m)
tr = nutpie.sample(cm)

It would be better to initialize at draws from the prior though...

Ricardo Vieira · Answer 2 · Sat May 06 2023 13:50:28 GMT+0800 (China Standard Time)

So the difference with PyMC is there there we do transformed(moment) +-1 and here we do 0 +- 1?

Could we use the PyMC initial point /moment logic if it's more stable?

Adrian Seyboldt · Answer 3 · Sat May 06 2023 23:19:13 GMT+0800 (China Standard Time)

I think so. There already is an option to set a mean, we'd only have to plug in the moment there.

Michael Osthege · Answer 4 · Mon May 22 2023 20:10:58 GMT+0800 (China Standard Time)

A workaround at the PyMC level can be as short as 4 lines, but the latest nutpie release was before the init_mean kwarg was added.

To fix this at the nutpie level, the init_means could be added as another attribute on the compiled_model object.

Today I'll try to workaround with local hotfixes, but I should be able to make a PR for this by the end of the week.

Michael Osthege · Answer 5 · Mon May 22 2023 22:51:50 GMT+0800 (China Standard Time)

This is the workaround for the PyMC level, however it didn't help for my model 🤔

compiled_model = nutpie.compile_pymc_model(model)
# Pass transformed, concatenated initial values until nutpie does it itself
initial_point = model.initial_point()
initial_means = np.concatenate([initial_point[model.rvs_to_values[var].name].flatten() for var in model.free_RVs])
idata = nutpie.sample(
    compiled_model,
    draws=draws,
    tune=tune,
    chains=chains,
    target_accept=target_accept,
    init_mean=initial_means,
    seed=_get_seeds_per_chain(random_seed, 1)[0],
    progress_bar=progressbar,
    **nuts_sampler_kwargs,
)

Adrian Seyboldt · Answer 6 · Tue May 23 2023 00:45:11 GMT+0800 (China Standard Time)

The point must be on the unconstrained space, but I think this will instead use values on the constrained space?

Michael Osthege · Answer 7 · Tue May 23 2023 05:29:29 GMT+0800 (China Standard Time)

The point must be on the unconstrained space, but I think this will instead use values on the constrained space?

The pmodel.initial_point() dictionary has only the unconstrained ones, e.g. noise_log__

So unless I'm mixing up the definitions these should be right, no?

Chris Fonnesbeck · Answer 8 · Tue Jun 13 2023 04:45:16 GMT+0800 (China Standard Time)

Can confirm that this fix does not work, in general. Also tried with random jitter around initial point. Perhaps this is not the issue?

Michael Osthege · Answer 9 · Sat Jun 17 2023 21:59:54 GMT+0800 (China Standard Time)

Here's a rather minimal example to reproduce the issue:

import io

import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt


def build_model(df_data, *, hsgp: bool):
    with pm.Model(
        coords={
            "records": np.arange(len(df_data)),
        }
    ) as pmodel:
        # Store data
        X = pm.ConstantData("X", df_data.x.to_numpy(), dims="records")
        Y = pm.ConstantData("Y", df_data.y.to_numpy(), dims="records")
        Y_std = pm.ConstantData("Y_std", pt.std(Y).eval())
        Y_mean = pm.ConstantData("Y_mean", pt.mean(Y).eval())

        # Model the (normalized) latent trend
        ls = pm.LogNormal("ls", mu=np.log(0.5), sigma=0.2)
        noise = pm.HalfNormal("noise", sigma=0.05)
        cov = noise**2 * pm.gp.cov.ExpQuad(1, ls=ls)
        mean = pm.gp.mean.Constant(Y_mean)
        
        if hsgp:
            gp = pm.gp.HSGP(m=[30], c=4.0, cov_func=cov, mean_func=mean)
        else:
            gp = pm.gp.Latent(cov_func=cov, mean_func=mean)
        ylatent = gp.prior("ylatent", X[:, None], dims="records")
        
        # Connect to observations
        pm.Normal("L", mu=ylatent, sigma=Y_std / 3, observed=Y, dims="records")
        
    # Keep a handle on the GP
    pmodel.gp = gp
    return pmodel


def analyze(df_data, *, build_kwargs, sample_kwargs):
    pmodel = build_model(df_data, **build_kwargs)
    with pmodel:
        idata = pm.sample(
            chains=4, tune=2000,
            target_accept=0.9, random_seed=1234,
            **sample_kwargs,
        )
    return idata


df_data = pd.read_csv(io.StringIO("""
,x,y
0,6.5,0.03847670954287112
1,7.0,0.040795546149772384
2,7.5,0.04005530626829538
3,6.5,0.03800804967481005
4,7.0,0.042606645754122346
5,7.5,0.03962986979767001
6,6.5,0.03975987684954445
7,7.0,0.042854077804484525
8,7.5,0.0427959406500711
9,6.5,0.0376618863654496
10,7.0,0.043800640042141875
11,7.5,0.04280278855102723
"""), index_col=0)

analyze(
    df_data,
    build_kwargs=dict(hsgp=True),
    sample_kwargs=dict(nuts_sampler="nutpie"),
)

Adrian Seyboldt · Answer 10 · Wed Jun 21 2023 04:59:13 GMT+0800 (China Standard Time)

@michaelosthege This does run on my machine, maybe this was fixed with pymc-devs/pytensor#343?
I do see some divergences, but that happens with nutpie and the default sampler.

Michael Osthege · Answer 11 · Sun Jun 25 2023 02:48:24 GMT+0800 (China Standard Time)

Confirmed that a new env with PyMC 5.5.0 and PyTensor 2.12.3 fixed this MRE on my machine too.

I'll re-run my benchmarking notebooks next week to see if it fixed all instances I had run into.

Shall we close this and re-open if needed?

Adrian Seyboldt · Answer 12 · Tue Jul 04 2023 03:00:05 GMT+0800 (China Standard Time)

I guess I'll close this issue then, but feel free to reopen (or open a new one) if this comes up again...