r-lib / vctrs

Generic programming with typed R vectors

Home Page:https://vctrs.r-lib.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error with factors that have additional classes

SebKrantz opened this issue · comments

I'm getting a serious error using ggplot together with collapse, in particular, collapse optimizes computations on factors by giving them an additional class "na.included" if they are known to not contain any NA's. This class avoids a missing value check when using factors for grouping, joins etc. see https://sebkrantz.github.io/collapse/reference/qF.html. In the example below, I melt a data frame using collapse::pivot which adds the "na.included" class to the generated 'variable' factor i.e. the class is c("factor", "na.included"). This produces an error in ggplot2 caused by vctrs. Thanks for considering this issue.

library(collapse)
#> collapse 2.0.3, see ?`collapse-package` or ?`collapse-documentation`
#> 
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#> 
#>     D
library(ggplot2)

mtcars |>
  pivot() |>
  ggplot(aes(y = value)) +
      geom_histogram() +
      facet_wrap( ~ variable)
#> Error in `stop_native_implementation()`:
#> ! `vec_ptype2.factor.factor()` is implemented at C level.
#>   This R function is purely indicative and should never be called.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.
#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. └─global `<fn>`(input = base::quote("neat-zebra_reprex.R"))
#>  13.   └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14.     └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15.       └─knitr:::process_file(text, output)
#>  16.         ├─base::withCallingHandlers(...)
#>  17.         ├─base::withCallingHandlers(...)
#>  18.         ├─knitr:::process_group(group)
#>  19.         └─knitr:::process_group.block(group)
#>  20.           └─knitr:::call_block(x)
#>  21.             └─knitr:::block_exec(params)
#>  22.               └─knitr:::eng_r(options)
#>  23.                 ├─knitr:::in_input_dir(...)
#>  24.                 │ └─knitr:::in_dir(input_dir(), expr)
#>  25.                 └─knitr (local) evaluate(...)
#>  26.                   └─evaluate::evaluate(...)
#>  27.                     └─evaluate:::evaluate_call(...)
#>  28.                       ├─evaluate (local) handle(...)
#>  29.                       │ └─base::try(f, silent = TRUE)
#>  30.                       │   └─base::tryCatch(...)
#>  31.                       │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  32.                       │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  33.                       │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  34.                       ├─base::withCallingHandlers(...)
#>  35.                       ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  36.                       └─knitr (local) value_fun(ev$value, ev$visible)
#>  37.                         └─knitr (local) fun(x, options = options)
#>  38.                           ├─base::withVisible(knit_print(x, ...))
#>  39.                           ├─knitr::knit_print(x, ...)
#>  40.                           └─knitr:::knit_print.default(x, ...)
#>  41.                             └─evaluate (local) normal_print(x)
#>  42.                               ├─base::print(x)
#>  43.                               └─ggplot2:::print.ggplot(x)
#>  44.                                 ├─ggplot2::ggplot_build(x)
#>  45.                                 └─ggplot2:::ggplot_build.ggplot(x)
#>  46.                                   └─layout$setup(data, plot$data, plot$plot_env)
#>  47.                                     └─ggplot2 (local) setup(..., self = self)
#>  48.                                       └─self$facet$compute_layout(data, self$facet_params)
#>  49.                                         └─ggplot2 (local) compute_layout(..., self = self)
#>  50.                                           └─ggplot2::combine_vars(data, params$plot_env, vars, drop = params$drop)
#>  51.                                             ├─ggplot2:::unique0(vec_rbind0(!!!values[has_all]))
#>  52.                                             └─ggplot2:::vec_rbind0(!!!values[has_all])
#>  53.                                               ├─ggplot2:::with_ordered_restart(...)
#>  54.                                               │ └─base::withCallingHandlers(...)
#>  55.                                               └─vctrs::vec_rbind(..., .error_call = .error_call)
#>  56.                                                 └─vctrs (local) `<fn>`()
#>  57.                                                   └─vctrs:::vec_ptype2.factor.factor(...)
#>  58.                                                     └─vctrs:::stop_native_implementation("vec_ptype2.factor.factor")
#>  59.                                                       └─cli::cli_abort(...)
#>  60.                                                         └─rlang::abort(...)

# The problem is the additional "na.included" class
mtcars |> pivot() |> vclasses()
#>             variable                value 
#> "factor na.included"            "numeric"

Created on 2023-10-26 with reprex v2.0.2

We don't currently support superclassing classes like that.

Would you consider using c("my_package_my_class", "factor") instead?

In this example I'm including a package prefix "my_package_", this is good practice because S3 classes are shared globally across the ecosystem.

Ok, the primary class is meant to be a factor, so I don't see the point of adding a class in front which is more likely to cause problems, especially if the only reason for doing so has to do with some restrictive C-level routine in vctrs. I will also not rename the class as that would be considered an API breaking change to collapse.

In general I don't think the order of classes should matter except for method dispatch. If vctrs has an internal method for factors it should dispatch on the factor class regardless of which other classes the object inherits.

We have seen similar issues in base R where R Core added the "matrix" class to arrays and it caused havoc for a lot of developers that did illicit checks on the length of the class vector.

Ok, the primary class is meant to be a factor, so I don't see the point of adding a class in front which is more likely to cause problems

On the contrary, since you're subclassing factor your class should come before your superclasses in the class vector. This is standard practice.

In general I don't think the order of classes should matter except for method dispatch. If vctrs has an internal method for factors it should dispatch on the factor class regardless of which other classes the object inherits.

Sorry we don't currently have plans to change this.

Ok, I will reshuffle the class, in the hope that it doesn't break anything. I still think vctrs is being too restrictive here though. There is, to my knowledge, nothing in R that warrants programs to fail because the order of classes is non-standard. Any user code that manipulates the class attribute could induce such artifacts.

Thanks, please let me know if this causes trouble elsewhere.

The problem in vctrs is that we don't do normal S3 dispatch, we do a limited form of double dispatch where inheritance is explicitly disabled. So superclassing is incompatible with this approach as we only dispatch to the leftmost classes.

You will probably have to implement some coercion methods for compatibility with vctrs/tidyverse, to allow your class to be coercible to factor without loss of information, see https://vctrs.r-lib.org/reference/howto-faq-coercion.html. We made the choice of being restrictive in case of class mismatches, though we'd like to be a bit more lenient in the future.

Ahh ok, I see this will cause errors probably everywhere:

library(vctrs)
#> Warning: package 'vctrs' was built under R version 4.3.1
f <- iris$Species
class(f) <- c("na.included", "factor")
vec_ptype2(f, iris$Sepal.Length)
#> Error:
#> ! Can't combine `f` <na.included> and `iris$Sepal.Length` <double>.
#> Backtrace:
#>      ▆
#>   1. ├─vctrs::vec_ptype2(f, iris$Sepal.Length)
#>   2. └─vctrs (local) `<fn>`()
#>   3.   └─vctrs::vec_default_ptype2(...)
#>   4.     ├─base::withRestarts(...)
#>   5.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
#>   6.     │   └─base (local) doWithOneRestart(return(expr), restart)
#>   7.     └─vctrs::stop_incompatible_type(...)
#>   8.       └─vctrs:::stop_incompatible(...)
#>   9.         └─vctrs:::stop_vctrs(...)
#>  10.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)

Created on 2023-10-26 with reprex v2.0.2

I definitely want to avoid opening a pandora's box that will force collapse to move along with tidyverse developments. So in this case I think sticking to my superclassing approach is preferable. Can the issue also be solved in a different way?

Isn't the error above expected? You'd only want your class to be coercible to factor and character IIUC?

Same with vec_ptype2(f, as.character(iris$Sepal.Length)). I guess the primary reason for me to add "na.included" after the "factor" class was precisely because I wanted it to be ignored by basically all other software, which would be the case here, but not if I make the adjustement to put the class before "factor". I now believe changing the class order could cause more issues with the tidyverse than anticipated. So my question is there a method such as vec_cast.na.included that I could define to make vctrs compatible with my superclassing approach. That to me seems a cleaner solution.

I see if I export vec_ptype2.factor.factor <- function(x, y, ...) x, this solves the problem with the present version of collapse. Could this cause any other issues? or is this function expected to do anything else?

I see if I export vec_ptype2.factor.factor <- function(x, y, ...) x, this solves the problem

😱 I would not do that. That will not solve your issue anyway since the dispatch mechanism will not look into the search (except the global env).

I now believe changing the class order could cause more issues with the tidyverse than anticipated

I'd be surprised and I don't think there has been evidence of that in this thread. The error you're seeing is expected behaviour in our stack.

Same with vec_ptype2(f, as.character(iris$Sepal.Length)).

Yes we require explicit coercion behaviour, which must be done by implementing the ptype2 and cast coercion methods for all relevant classes (see linked FAQ). I'm sorry that causes you more work. The main issue with this approach is that it doesn't scale for classes like I() that are essentially mixins for a variable number of classes. Hopefully that's not your case?

That's a lot of coding expected from my side and I don't see that it solves the issue once and for all, because you could introduce new classes that require coercion to/from factor, and then I would be required to implement the casting methods for that. I'll think about it further, but I think this is not a good option.

Thanks for your consideration.