Numeric response error for classification models

Question

Numeric response error for classification models

tonyelhabr opened this issue 2 years ago · comments

Hi,

I'm looking for clarification about why only the last format in the reprex works. I expect that the first call to fit would not work since {parsnip} requires that the outcome variable is a factor for classification (even though brms() expects a numeric outcome for classification). However, I don't see why the second and third calls to fit() don't work; only the fourth format is successful.

library(bayesian)
library(brms)
library(parsnip)

df1 <- mtcars
df2 <- df1
df2$vs <- factor(df1$vs)

## I expect that this wouldn't work.
bayesian(mode = 'classification') |>
  set_engine('brms') |>
  fit(
    vs ~ mpg + wt + cyl,
    data = df1
  )
#> Error in `check_outcome()`:
#> ! For a classification model, the outcome should be a factor.

## I expect that this would work.
bayesian(mode = 'classification') |>
  set_engine('brms') |>
  fit(
    vs ~ mpg + wt + cyl,
    data = df2
  )
#> Error: Family 'gaussian' requires numeric responses.

## This also seems like it should work?
bayesian() |>
  set_engine('brms', family = bernoulli()) |>
  set_mode('classification') |> 
  fit(
    vs ~ mpg + wt + cyl,
    data = df2
  )
#> Warning: The following arguments cannot be manually modified and were removed:
#> family.
#> Error: Family 'gaussian' requires numeric responses.

## This is the only one that works. Functionally, I don't see why this is different from above.
bayesian(
  mode = 'classification', 
  engine = 'brms', 
  family = bernoulli()
) |>
  fit(
    vs ~ mpg + wt + cyl,
    data = df2
  )
#> Compiling Stan program...
#> ...

Hamada S. Badr · Answer 1 · Mon Jul 18 2022 20:42:39 GMT+0800 (China Standard Time)

You're right about the 1st one. The 2nd one fails a brms check at:
https://github.com/paul-buerkner/brms/blob/004edb522477d88dabe3815aae099b1211561076/R/data-response.R#L109-L111

In general, parsnip requires a factor outcome for classification models while the engine fit function may support, or even require, numeric outcomes for the classification using certain families (e.g., ordinal categorical).