Numeric response error for classification models
tonyelhabr opened this issue · comments
Hi,
I'm looking for clarification about why only the last format in the reprex works. I expect that the first call to fit
would not work since {parsnip}
requires that the outcome variable is a factor for classification (even though brms()
expects a numeric outcome for classification). However, I don't see why the second and third calls to fit()
don't work; only the fourth format is successful.
library(bayesian)
library(brms)
library(parsnip)
df1 <- mtcars
df2 <- df1
df2$vs <- factor(df1$vs)
## I expect that this wouldn't work.
bayesian(mode = 'classification') |>
set_engine('brms') |>
fit(
vs ~ mpg + wt + cyl,
data = df1
)
#> Error in `check_outcome()`:
#> ! For a classification model, the outcome should be a factor.
## I expect that this would work.
bayesian(mode = 'classification') |>
set_engine('brms') |>
fit(
vs ~ mpg + wt + cyl,
data = df2
)
#> Error: Family 'gaussian' requires numeric responses.
## This also seems like it should work?
bayesian() |>
set_engine('brms', family = bernoulli()) |>
set_mode('classification') |>
fit(
vs ~ mpg + wt + cyl,
data = df2
)
#> Warning: The following arguments cannot be manually modified and were removed:
#> family.
#> Error: Family 'gaussian' requires numeric responses.
## This is the only one that works. Functionally, I don't see why this is different from above.
bayesian(
mode = 'classification',
engine = 'brms',
family = bernoulli()
) |>
fit(
vs ~ mpg + wt + cyl,
data = df2
)
#> Compiling Stan program...
#> ...
You're right about the 1st one. The 2nd one fails a brms
check at:
https://github.com/paul-buerkner/brms/blob/004edb522477d88dabe3815aae099b1211561076/R/data-response.R#L109-L111
In general, parsnip
requires a factor outcome for classification models while the engine fit function may support, or even require, numeric outcomes for the classification using certain families (e.g., ordinal categorical).