hsbadr / bayesian

Bindings for Bayesian TidyModels

Home Page:https://hsbadr.github.io/bayesian/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support multivariate non-linear models

DoktorMike opened this issue · comments

I am building a model using the non-linear interface in brms and it works like a charm with your package. Now I need to extend this to be a multivariate model, i.e., multiple response variables and I cannot seem to get it to work.

Brief description of the problem

library(tidyverse)
library(brms)
library(bayesian)

mydf <- tibble(
        x = sin(seq(1, 10, 0.25)),
        z = 2 * x + rnorm(length(x), 0, 0.5),
        y = 1 * x + rnorm(length(x), 0, 0.5)
)

myrec <- mydf %>%
        recipe() %>%
        update_role(z, new_role = "outcome") %>%
        update_role(y, new_role = "outcome") %>%
        update_role(x, new_role = "predictor") 
        #add_role(has_role("outcome"), new_role = "predictor")

myform <- bf(z ~ b1 * x, b1 ~ 1, nl = TRUE) +
        bf(y ~ b2 * x, b2 ~ 1, nl = TRUE)
myprior <- prior(normal(0, 1), nlpar = "b1", resp = "z") +
	prior(normal(0, 1), nlpar = "b2", resp = "y")
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula = z ~ z + y + x) %>%
        update(formula.override = bayesian_formula(myform)) %>%
        update(prior = myprior) %>%
        update(family = list("gaussian", "gaussian"))

myworkflow <- workflow() %>%
        add_recipe(myrec) %>%
        add_model(spec = mymodel)

myworkflow_fit <- myworkflow %>%
        fit(data = mydf)

I get the error message Error in formula.default(object, env = baseenv()) : invalid formula but when just using one response variable it works fine. So is this an unsupported feature or am I doing something wrong?

If you instead set the mymodel to be univariate like this:

# Single response variable works
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula.override = bayesian_formula(bf(z ~ b * x, b ~ 1, nl = T))) %>%
        update(prior = prior(normal(0, 1), nlpar = "b")) %>%
        update(family = "gaussian")

Everything works.

Update:

I can get the training to work by setting the class of the formula manually instead of relying on bayesian_formula. Thus setting mymodel to this works.

# Multiple response variable FAILS
myform <- bf(z ~ b1 * x, b1 ~ 1, nl = TRUE) +
        bf(y ~ b2 * x, b2 ~ 1, nl = TRUE)
class(myform) <- union(class(myform), "formula") # Need to actually fit the mvmodel
myprior <- prior(normal(0, 1), nlpar = "b1", resp = "z") +
        prior(normal(0, 1), nlpar = "b2", resp = "y")
mymodel <- bayesian(family = normal(), cores = 4, chains = 4) %>%
        set_engine("brms") %>%
        set_mode("regression") %>%
        update(formula = z ~ z + y + x) %>%
        #update(formula.override = bayesian_formula(myform)) %>%
        update(formula.override = myform) %>%
        update(prior = myprior) %>%
        update(family = list("gaussian", "gaussian"))

The predictions do not work though. Most likely because parsnip doesn't know how to handle multivariate responses from brms. The error we get is: Error in results[, 1] : incorrect number of dimensions

The traceback is:

13: eval_tidy(xs[[j]], mask)
12: tibble_quos(xs, .rows, .name_repair)
11: tibble::tibble(.pred = results[, 1])
10: object$spec$method$pred$numeric$post(res, object)
9: predict_numeric.model_fit(object = object, new_data = new_data,
       ...)
8: predict_numeric(object = object, new_data = new_data, ...)
7: predict.model_fit(fit, new_data, type = type, opts = opts, ...)
6: predict(fit, new_data, type = type, opts = opts, ...)
5: predict.workflow(., new_data = mydf, type = NULL, level = 0.4)
4: predict(., new_data = mydf, type = NULL, level = 0.4)
3: list2(...)
2: bind_cols(., mydf)
1: myworkflow_fit %>% predict(new_data = mydf, type = NULL, level = 0.4) %>%
       bind_cols(mydf)

Tracked it down to happen in this function:

myworkflow_fit$fit$fit$spec$method$pred$numeric$post
function (results, object)
{
    tibble::tibble(.pred = results[, 1])
}

But where this is defined I haven't found.

Thanks @DoktorMike! It'll likely need a minor workaround to avoid the expected linear formula or single family currently required by tidymodels. The main difference between bayesian_formula() and brms::brmsformula() is adding the formula class; so, it should work fine. But, the predictions need to detect and support lists or multiple response variables (or n-dimensional outcome, in general). I'll look into it, ASAP.

Sounds great @hsbadr ! Let me know if I can do anything to help.

@DoktorMike Can you test this now? It should work fine. As for the predictions, you can use type = "raw" to get the predictions, or predict directly from the brmsfit object:

mybrmsfit <- myworkflow_fit %>%
  extract_fit_engine()
  
predict(mybrmsfit, head(mydf))

I'll add support for multivariate non-linear models in the predictions.

You can now use the following:

myworkflow_fit %>% 
  predict(
    new_data = head(mydf),
    type = "raw"
  )

You may pass other arguments to predict.brmsfit as follows:

myworkflow_fit %>% 
  predict(
    new_data = head(mydf),
    type = "raw",
    opts = list(
      summary = TRUE,
      robust = FALSE,
      probs = c(0.025, 0.975)
    )
  )

Awesome @hsbadr ! I can confirm that this indeed works as expected.