TODO: figure out how to run posterior predictive simulations in Bambi

Question

TODO: figure out how to run posterior predictive simulations in Bambi

tommylees112 opened this issue 4 years ago · comments

I don't know if this is helpful but I noticed in a few of the notebooks you have notes to yourself about using Bambi to get a posterior predictive check.

Example in Congress Notebook:

TODO: Verify this is true
Bambi does not have a way to make posterior predictive given a data frame of independent variables

My attempt

I am no PyMC guru but I started playing around with the possibilities.

On the forums I found this answer, How to predict new values on hold out data:

I extracted information about the bambi object manually. This could either be automated, or else there may be a function to more readily gain access to the Stan object itself, much as you can use get_stancode with brms.

pm_model = model.backend.model
def get_priors(model: Model):
    return {x.name: x.prior.args for x in model.terms.values()}

print('vote ~ past_vote + inc')
print(get_priors(model))

pm.model_to_graphviz(pm_model)

Here I re-write the model manually (which I was able to extract from the bambi model using model.backend.model).

# hold out test data
data90 = pd.DataFrame(dict(vote=congress["v90_adj"], past_vote=congress["v88_adj"], inc=congress["inc90"]))

# model function that we fit and sample from
def model_factory(data):
    with pm.Model() as model:
        a = pm.Normal("Intercept", mu=0.543, sigma=0.367)
        b_pv = pm.Normal("past_vote", mu=0, sigma=0.558)
        b_inc = pm.Normal("inc", mu=0, sigma=0.118)
        vote_sd = pm.Uniform("vote_sd", lower=0, upper=data["vote"].std())

        mu = a + (b_pv * data["past_vote"]) + (b_inc * data["inc"])
        vote = pm.Normal("vote", mu=mu, sigma=vote_sd, observed=data["vote"])
    
    return model


with model_factory(data88):
    trace = pm.sample(draws=4000,
                      tune=700,
                      init=None,
                      start=None,
                      cores=2,
                      chains=2,
                      random_seed=1,
                      discard_tuned_samples=True,
                      compute_convergence_checks=True)

with model_factory(data90):
    ppc = pm.sample_posterior_predictive(trace) #or whatever
    ppc = az.from_pymc3(posterior_predictive=ppc)

The ppc object is now an arviz.InferenceData object and we can use the built in az.plot_ppc method to visualise the posterior predictive distribution.

f, ax = plt.subplots()
az.plot_ppc(ppc, num_pp_samples=500, ax=ax);

We can also extract the the table in Figure 10.7 (P.143):

df = ppc.posterior_predictive.isel(chain=0).drop("chain").to_dataframe().reset_index().rename({"vote_dim_0": "district"}, axis=1)
pred90 = df.pivot(index="draw", columns="district", values='vote')

Outstanding question

One thing I haven't worked out / understood, is that in the book (and with the posterior_predict function from rstanarm), the returned table (P.143, Figure 10.7) there are columns of the estimates for $\sigma$ and the $\Beta$ coefficients. The arviz.sample_posterior_predictive does not return estimates of the model parameters, only the posterior predictions for the test-datapoints.

I am sure you have already come up with a more elegant solution! Just posting in case this is helpful information.

As a python user I am very grateful for this workthrough of the Regression and Other Stories! So thank you

Tommy

Osvaldo A Martin · Answer 1 · Wed Dec 02 2020 19:49:42 GMT+0800 (China Standard Time)

If it helps with this. Bambi is now able to execute prior/posterior predictive sampling