pymc-devs / nutpie

Python wrapper for nuts-rs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: Logp function returned error Initialization of first point failed

ricardoV94 opened this issue · comments

import nutpie
import pymc as pm

with pm.Model() as m:
    proposed_pay = pm.MutableData("proposed_pay", np.array([50_000, 200_000], dtype="float64"))
    accepted = pm.MutableData("accepted", np.array([0, 1], dtype="int64"))
    
    mean = pm.Normal("mu", mu=100_000, sigma=25_000)
    std = pm.Gamma("std", mu=25_000, sigma=5_000)
    
    p_accept = 1 - pm.logcdf(pm.Normal.dist(mean, std), proposed_pay).exp()
    p_accept = pm.Deterministic("p_accept", p_accept)
    
    llike = pm.Bernoulli("llike", p=p_accept, observed=accepted)
    
cm = nutpie.compile_pymc_model(m)
nutpie.sample(cm)

I get these weird warnings from pytensor/numba

/tmp/tmps3yx0tdg:1: NumbaWarning: Cannot cache compiled function "numba_funcified_fgraph" as it uses dynamic globals (such as ctypes pointers and large global arrays)
  def numba_funcified_fgraph(scalar_variable, scalar_variable_1, scalar_variable_7, scalar_variable_11, scalar_variable_3, scalar_variable_5, scalar_variable_15, scalar_variable_19):
site-packages/nutpie/compile_pymc.py:362: NumbaWarning: Cannot cache compiled function "numba_funcified_fgraph" as it uses dynamic globals (such as ctypes pointers and large global arrays)
  return inner(x, *_shared_tuple)

And then a ValueError

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [43], in <cell line: 17>()
     14     llike = pm.Bernoulli("llike", p=p_accept, observed=accepted)
     16 cm = nutpie.compile_pymc_model(m)
---> 17 nutpie.sample(cm)

site-packages/nutpie/sample.py:101, in sample(compiled_model, draws, tune, chains, seed, num_try_init, save_warmup, store_divergences, progress_bar, init_mean, store_unconstrained, **kwargs)
     98 if init_mean is None:
     99     init_mean = np.zeros(compiled_model.n_dim)
--> 101 sampler = lib.PyParallelSampler(
    102     compiled_model.logp_func_maker,
    103     init_mean,
    104     settings,
    105     n_chains=chains,
    106     n_draws=draws,
    107     seed=seed,
    108     n_try_init=num_try_init,
    109 )
    111 expand_draw = compiled_model.expand_draw_fn
    113 def do_sample():

ValueError: Logp function returned error Initialization of first point failed

I think this is a consequence of the initial point choice in nutpie. It currently tries to initialize all parameters in the transformed space in (-1, 1). But in that range we just can't find any valid points. The problem goes away if we change the scaling of the variables (ie we measure everything in multiples of 10_000):

with pm.Model() as m:
    proposed_pay = pm.MutableData("proposed_pay", np.array([5.0, 20.0], dtype="float64"))
    accepted = pm.MutableData("accepted", np.array([0, 1], dtype="int64"))

    mean = pm.Normal("mu", mu=10.0, sigma=2.5)
    std = pm.Gamma("std", mu=2.5, sigma=.5)

    p_accept = 1 - pm.logcdf(pm.Normal.dist(mean, std), proposed_pay).exp()
    p_accept = pm.Deterministic("p_accept", p_accept)

    llike = pm.Bernoulli("llike", p=p_accept, observed=accepted)

cm = nutpie.compile_pymc_model(m)
tr = nutpie.sample(cm)

It would be better to initialize at draws from the prior though...

So the difference with PyMC is there there we do transformed(moment) +-1 and here we do 0 +- 1?

Could we use the PyMC initial point /moment logic if it's more stable?

I think so. There already is an option to set a mean, we'd only have to plug in the moment there.

A workaround at the PyMC level can be as short as 4 lines, but the latest nutpie release was before the init_mean kwarg was added.

To fix this at the nutpie level, the init_means could be added as another attribute on the compiled_model object.

Today I'll try to workaround with local hotfixes, but I should be able to make a PR for this by the end of the week.

This is the workaround for the PyMC level, however it didn't help for my model 🤔

compiled_model = nutpie.compile_pymc_model(model)
# Pass transformed, concatenated initial values until nutpie does it itself
initial_point = model.initial_point()
initial_means = np.concatenate([initial_point[model.rvs_to_values[var].name].flatten() for var in model.free_RVs])
idata = nutpie.sample(
    compiled_model,
    draws=draws,
    tune=tune,
    chains=chains,
    target_accept=target_accept,
    init_mean=initial_means,
    seed=_get_seeds_per_chain(random_seed, 1)[0],
    progress_bar=progressbar,
    **nuts_sampler_kwargs,
)

The point must be on the unconstrained space, but I think this will instead use values on the constrained space?

The point must be on the unconstrained space, but I think this will instead use values on the constrained space?

The pmodel.initial_point() dictionary has only the unconstrained ones, e.g. noise_log__

So unless I'm mixing up the definitions these should be right, no?

Can confirm that this fix does not work, in general. Also tried with random jitter around initial point. Perhaps this is not the issue?

Here's a rather minimal example to reproduce the issue:

import io

import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt


def build_model(df_data, *, hsgp: bool):
    with pm.Model(
        coords={
            "records": np.arange(len(df_data)),
        }
    ) as pmodel:
        # Store data
        X = pm.ConstantData("X", df_data.x.to_numpy(), dims="records")
        Y = pm.ConstantData("Y", df_data.y.to_numpy(), dims="records")
        Y_std = pm.ConstantData("Y_std", pt.std(Y).eval())
        Y_mean = pm.ConstantData("Y_mean", pt.mean(Y).eval())

        # Model the (normalized) latent trend
        ls = pm.LogNormal("ls", mu=np.log(0.5), sigma=0.2)
        noise = pm.HalfNormal("noise", sigma=0.05)
        cov = noise**2 * pm.gp.cov.ExpQuad(1, ls=ls)
        mean = pm.gp.mean.Constant(Y_mean)
        
        if hsgp:
            gp = pm.gp.HSGP(m=[30], c=4.0, cov_func=cov, mean_func=mean)
        else:
            gp = pm.gp.Latent(cov_func=cov, mean_func=mean)
        ylatent = gp.prior("ylatent", X[:, None], dims="records")
        
        # Connect to observations
        pm.Normal("L", mu=ylatent, sigma=Y_std / 3, observed=Y, dims="records")
        
    # Keep a handle on the GP
    pmodel.gp = gp
    return pmodel


def analyze(df_data, *, build_kwargs, sample_kwargs):
    pmodel = build_model(df_data, **build_kwargs)
    with pmodel:
        idata = pm.sample(
            chains=4, tune=2000,
            target_accept=0.9, random_seed=1234,
            **sample_kwargs,
        )
    return idata


df_data = pd.read_csv(io.StringIO("""
,x,y
0,6.5,0.03847670954287112
1,7.0,0.040795546149772384
2,7.5,0.04005530626829538
3,6.5,0.03800804967481005
4,7.0,0.042606645754122346
5,7.5,0.03962986979767001
6,6.5,0.03975987684954445
7,7.0,0.042854077804484525
8,7.5,0.0427959406500711
9,6.5,0.0376618863654496
10,7.0,0.043800640042141875
11,7.5,0.04280278855102723
"""), index_col=0)

analyze(
    df_data,
    build_kwargs=dict(hsgp=True),
    sample_kwargs=dict(nuts_sampler="nutpie"),
)

@michaelosthege This does run on my machine, maybe this was fixed with pymc-devs/pytensor#343?
I do see some divergences, but that happens with nutpie and the default sampler.

Confirmed that a new env with PyMC 5.5.0 and PyTensor 2.12.3 fixed this MRE on my machine too.

I'll re-run my benchmarking notebooks next week to see if it fixed all instances I had run into.

Shall we close this and re-open if needed?

I guess I'll close this issue then, but feel free to reopen (or open a new one) if this comes up again...