S3 method "registered" in the global environment are not respected inside a `box`-ed module
ArcadeAntics opened this issue · comments
Error description
I have an S3 class "testClass" and I have a object / / method for said class:
obj <- structure(0:25, class = "testClass")
names.testClass <- function(x) letters
this function is declared in the global environment. I also have a script "test.R" that looks like this:
writeLines("testFun <- function(x) names(x)", "./test.R")
When I load this module with box::use(./test)
and use test$testFun(obj)
, it fails to recognize this "registered" method (I say registered in quotes because it is not actually registered with registerS3method()
, just that evalq(names(obj), .GlobalEnv)
works as expected.
This comes from the fact that 'box' sets the parent environment of "namespace:test" to baseenv()
instead of .BaseNamespaceEnv
(if the parent environment of "namespace:test" was the base namespace, the parent of which is the global environment, then the method would be found from the context of the module.)
If you'd like a more reproducible example, try this:
obj <- structure(0:25, class = "testClass")
names.testClass <- function(x) letters
exprs <- expression(testFun <- function(x) names(x))
envir1 <- new.env(parent = .BaseNamespaceEnv)
source(exprs = exprs, local = envir1)
envir1$testFun(obj)
envir2 <- new.env(parent = baseenv())
source(exprs = exprs, local = envir2)
envir2$testFun(obj)
I find the following output:
> envir1$testFun(obj)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z"
> envir2$testFun(obj)
NULL
However, I'm sure changing the parent environment to the base namespace instead of the base environment has some drastic consequences, so I'm not sure how realistic this is.
R version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 2.2
year 2022
month 10
day 31
svn rev 83211
language R
version.string R version 4.2.2 (2022-10-31 ucrt)
nickname Innocent and Trusting
also on my Linux computer:
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 2.1
year 2022
month 06
day 23
svn rev 82513
language R
version.string R version 4.2.1 (2022-06-23)
nickname Funny-Looking Kid
This also fails on several older versions of R.
‘box’ version
1.1.2
This is by design: you must register S3 methods for them to be found. And this is generally the case, not just with ‘box’. For instance, as of version 3.6, R no longer searches all attached environments, only the current environment and the top environment. Try the following:
e <- new.env()
local(envir = e, {
names.testClass <- function(x) letters
})
attach(e)
obj <- structure(0:25, class = "testClass")
names(obj)
This prints NULL
rather than letters, even though names.testClass
is on the search path and could be called explicitly.
The S3 documentation of R is unfortunately incredibly shoddy but the expected usage in script code is to register S3 methods explicitly using .S3method
, e.g.:
…
local(envir = e, {
names.testClass <- function(x) letters
.S3method('names', 'testClass')
})
…
This makes the above code work, and it also makes the ‘box’ module work.
I'm sure changing the parent environment to the base namespace instead of the base environment has some drastic consequences
Correct. First off, as noted above this wouldn’t actually fix the problem, except for the very limited case of method definitions inside .GlobalEnv
, but nowhere else. Secondly, the reason this fix works is precisely the reason why box does not use the base namespace as the parent environment: namely, parent.env(.BaseNamespaceEnv)
is .GlobalEnv
, and that leads to error-prone code, because it means that module code might be accidentally accessing names that are not defined inside the module. In fact, not just .GlobalEnv
but all attached packages would leak into the module’s namespace.
Consider the following module:
fllter = function (x, cond) x[cond] # oops, typo in name!
#' @export
even_odd = function (x) filter(x, rep(c(FALSE, TRUE), length.out = nrow(x)))
In the hypothetical ‘box’ implementation (and the actual implementation of R packages!), the author tests the code, it works, so the author ships it. But then users complain that the code doesn’t work for them. Worse, the code actually ran and produced a result, and they noticed too late that the result was wrong, losing their company millions dollars.
Turns out, the author had the ‘dplyr’ package attached while testing their code, so the filter
call inside the module invoked dplyr::filter
. The users didn’t have ‘dplyr’ attached, so the module instead called stats::filter
instead.
‘box’ prevents this situation by being intentionally strict about how names are searched.
This is by design: you must register S3 methods for them to be found. And this is generally the case, not just with ‘box’. For instance, as of version 3.6, R no longer searches all attached environments, only the current environment and the top environment. Try the following:
e <- new.env() local(envir = e, { names.testClass <- function(x) letters }) attach(e) obj <- structure(0:25, class = "testClass") names(obj)This prints
NULL
rather than letters, even thoughnames.testClass
is on the search path and could be called explicitly.The S3 documentation of R is unfortunately incredibly shoddy but the expected usage in script code is to register S3 methods explicitly using
.S3method
, e.g.:… local(envir = e, { names.testClass <- function(x) letters .S3method('names', 'testClass') }) …This makes the above code work, and it also makes the ‘box’ module work.
I'm sure changing the parent environment to the base namespace instead of the base environment has some drastic consequences
Correct. First off, as noted above this wouldn’t actually fix the problem, except for the very limited case of method definitions inside
.GlobalEnv
, but nowhere else. Secondly, the reason this fix works is precisely the reason why box does not use the base namespace as the parent environment: namely,parent.env(.BaseNamespaceEnv)
is.GlobalEnv
, and that leads to error-prone code, because it means that module code might be accidentally accessing names that are not defined inside the module. In fact, not just.GlobalEnv
but all attached packages would leak into the module’s namespace.Consider the following module:
fllter = function (x, cond) x[cond] # oops, typo in name! #' @export even_odd = function (x) filter(x, rep(c(FALSE, TRUE), length.out = nrow(x)))In the hypothetical ‘box’ implementation (and the actual implementation of R packages!), the author tests the code, it works, so the author ships it. But then users complain that the code doesn’t work for them. Worse, the code actually ran and produced a result, and they noticed too late that the result was wrong, losing their company millions dollars.
Turns out, the author had the ‘dplyr’ package attached while testing their code, so the
filter
call inside the module invokeddplyr::filter
. The users didn’t have ‘dplyr’ attached, so the module instead calledstats::filter
instead.‘box’ prevents this situation by being intentionally strict about how names are searched.
Isn't this just a better reason to be making packages instead of boxes? R CMD check would immediately pick up that no function 'filter' was imported.
Isn't this just a better reason to be making packages instead of boxes?
How so? ‘box’ modules are stricter (and thus more robust) than R packages. The behaviour I outlined in my comment above fixes what I perceive as a serious flaw in the R package system.
R CMD check would immediately pick up that no function 'filter' was imported.
Yes, if you run it. But R CMD check
also has limitations since it performs static analysis, and R is an inherently dynamic language. So static analysis won’t find every possible usage of a name. R CMD CHECK
has both false negatives and false positives: code that — for whatever reason — uses e.g. get("filter")
instead of filter
would pass R CMD check
and fail silently at runtime. Conversely, code that uses NSE (correctly) would create a false-positive flag in the check.
But fundamentally it’s true that having some static checks would be beneficial. Adding this functionality to ‘box’ is on the to-do list.