klmr / box

Write reusable, composable and modular R code

Home Page:https://klmr.me/box/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider making ‘box’ namespace environments actual namespaces

klmr opened this issue · comments

‘box’ module “namespace” environments are currently not actually namespaces as far as R is concerned (i.e. they don’t have a .__NAMESPACE__. environment with a nested spec name of type string).

We should explore what consequences changing this would have. See:

box/R/env.r

Lines 21 to 22 in 2cef22e

# FIXME: Why not use `.__NAMESPACE__.` here?
ns_env$.__module__. = ns_attr

On the positive side, it would lead to a nicer printing of the closure environment of a function defined inside a module. On the flip side, there might be rather a lot of unintended, subtle side-effects that current tests don’t capture, and which could break user code.

This may break a PR we have for box support in the covr package. r-lib/covr#526

Ah, thanks for letting me know. I think it should still work, but at any rate now I know about it and will pay attention to not breaking it.

Also, great work on the ‘covr’ PR!

Thanks! If you do change to namespace environments, I'll also take a look in the covr side.

The covr PR, however, still has one CI fail with oldrel-4. box fails to install. I think it's related to the R-3.6 backports fixed in 9c47553.

As someone who's tried this before in a personal R package, this is a bad idea.

Namespaces are expected to have a consistent structure that package:box does not follow. A namespace has a parent environment with a name imports:pkgname whose parent environment is the base namespace, which box namespaces do not follow. If you were to make the box namespaces into "real" namespaces, this means any closures written in a box namespace cannot be compiled, and any loops cannot be compiled either (see ?compiler::enableJIT).

You would also run into the issue that a namespace is expected to originate from a directory with sub-directories help, Meta, R, possibly others such as html and libs, as well as files DESCRIPTION, INDEX, MD5, and NAMESPACE. box modules do not offer such a file structure, so they should not pretend to be package namespaces.

Then comes the issue of namespace registration. Namespaces need to be registered, and you can register them, technically; here is some code at the C level:

SEXP do_regBoxNS(SEXP call, SEXP op, SEXP args, SEXP rho)
{
    args = CDR(args);
    SEXP name = CAR(args); args = CDR(args);
    switch (TYPEOF(name)) {
    case SYMSXP:
        break;
    case STRSXP:
        if (LENGTH(name)) {
            name = installTrChar(STRING_ELT(name, 0));
            break;
        }
    default:
        errorcall(call, "bad package:box namespace name");
    }
    SEXP boxNS = CAR(args); args = CDR(args);
    if (findVarInFrame(R_NamespaceRegistry, name) != R_UnboundValue)
        errorcall(call, "package:box namespace already registered");
    defineVar(name, boxNS, R_NamespaceRegistry);
    return R_NilValue;
}


static const R_ExternalMethodDef externalRoutines[] = {
{".C_regBoxNS", (DL_FUNC) &do_regBoxNS, 2},
{NULL, NULL, 0}
}


attribute_visible
void R_init_box(DllInfo *dll)
{
    R_registerRoutines(dll, NULL, NULL, NULL, externalRoutines);
    R_useDynamicSymbols(dll, FALSE);
    R_forceSymbols(dll, TRUE);
}

and at the R level:

spec <- box:::parse_spec(quote(./example), "")
source_path <- normalizePath("./example.R")
info <- box:::mod_info(spec, source_path)
mod_ns <- box:::make_namespace(info)
mod_ns$.__NAMESPACE__. <- new.env(hash = TRUE, parent = baseenv())
mod_ns_name <- info$name
## or possibly this:
# mod_ns_name <- info$source_path
mod_ns$.__NAMESPACE__.$spec <- c(name = mod_ns_name, version = "0.1.0")
.External2(.C_regBoxNS, mod_ns_name, mod_ns)

but this is a horrible idea. do_regNS is attribute_hidden and in Rdll.hide meaning it is not meant to be used and cannot be used by R users. The only reason R_NamespaceRegistry is not also attribute_hidden or in Rdll.hide is to make accessing actual namespaces and variables from C easier. It should be regarded as read-only.

Even if you can justify registering your box namespaces in R_NamespaceRegistry, there comes the issue of the namespace name getNamespaceName(ns). Namespaces are expected to have a valid package name grepl(paste0("^", .standard_regexps()$valid_package_name, "$"), getNamespaceName(ns)), but filenames have no such requirement, meaning you might be registering namespaces with invalid names. Additionally, if you choose to use just the basename of the file as the name to be registered, you will run into issues of duplicates; box namespaces with the same namespace name that print identically, serialize identically, and cannot both be simultaneously registered. Each namespace is required to have a unique name. So you could use the normalized path, now duplicates are not an issue, but now the names most definitely are invalid.

On the topic of serialization, namespaces are serialized in a manner unlike normal environments. See these links:

This means that no matter which name you decided to give to the box namespaces, they will always be serialized and unserialized as real namespaces. When being restored, the box namespaces will become real namespaces that link to the wrong environment, or will become R_GlobalEnv because they cannot be restored, both of which are erroneous.

There is another issue in that box namespaces can import variables from a package and use them. However, given that these are not actual namespaces, they are not registered as using those namespaces, meaning a real namespace can be unloaded even though it is in use by a box namespace. This might not break any R code, but all pointers to C or Fortran functions in that package will be set to NULL so cannot be used, and any packages which expect the package to be loaded will see it's unloaded and try to reload it, giving you two R package namespaces at the same time.

As someone who has wasted tons of time trying to do the same thing myself, please don't waste your own time trying this.

Thanks for this detailed explanation! Namespace registration and serialisation are definitely good points which all but forbid doing this. — The naming ambiguity could be handled, but that would defeat the purpose of having a “nice” printable namespace name. In other words, there doesn’t seem to be a point to making ‘box’ module namespaces into package namespaces.