briandconnelly / ami

Checks for Various Computing Environments

Home Page:https://briandconnelly.github.io/ami/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update on_cran function?

mpadge opened this issue · comments

Hi @briandconnelly. I need a generic function to test whether code is on_cran(), but unfortunately not the one you've coded here which only applies in devtools/testthat/r-lib environments. I was wondering whether you'd be open to updating/improving your on_cran() function here to instead check envvars typically set on CRAN machines? That kind of function already exists in the fda::CRAN() function, but your ami package seems like a more appropriate home for a general-purpose function.

The idea would be to compare envvars in an environment with those set during R CMD check, or possibly other environments like Kurt Hornik's checking environment. Direct comparison of full variable names would be impractical, as they may change regularly and so require too much maintenance. But something along the lines of fda::CRAN() might be feasible, where the generic _R_CHECK_ pattern is matched, and CRAN is set TRUE is >= some minimal number of matches.

There are also issues that R CMD check is not the only environment in which CRAN runs code (for example, reverse-dependency checks), and so those check variables wouldn't cover everything. But they'd be a start to a more general function that your current implementation here. What do you think?

@mpadge While I like consistency with the devtools/testthat/r-lib environments, I think what you're proposing here is the better way to go.

It seems like you have a lot more familiarity with this than I do, so I'd absolutely welcome a PR if you've got the time.

Sure, luckily I need to do it for work-related stuff anyway, so should be able to find time next week (from 13th May) to give it a go. Thanks for your quick response!

A bit more background research on PR#15, some of which should be summarized in documentation of proposed function changes.

Environment Variables in R source code

This following lists exclude:

  • Environmental variable manipulation in test suites of R itself.
  • Calls to Sys.setenv() within on.exit().

Beyond those, the current R source code sets the following environment variables:

  • Generic usage and modification of R_LIBS_SITE, R_LIBS_USER, DISPLAY, PATH, LANGUAGE (for example here.)
  • _R_RD_MACROS_PACKAGE_DIR_, set during package build process, and also during check here
  • R_BUILD_TEMPLIB during build (along with other generic R_LIBS, R_PACKAGE_NAME/DIR, R_LIBRARY_DIR instances).
  • During installation, several instances of R_OS_TYPE, R_HOME, R_ARCH, R_PACKAGE_NAME, R_INSTALL_PKG, R_LIBRARY_DIR, R_LIBS, CLINK_CPPFLAGS, R_ENABLE_JIT, and the following sub-scripted variables: _R_INSTALL_NO_DONE_, _R_INSTALL_SUPPRESS_NO_STAGED_MESSAGE.
  • During texi2dvi production, TEXTINPUTS, BSTINPUTS, and TEXTINDY.
  • _R_NS_LOAD_ (and here), in namespace attachment, both of which are then on.exit(Sys.unsetenv()), so present only during namespace loading.
  • TZDIR on MacOS only in datetime.R.

Other than those, the main set of variables is set during R CMD check, which currently including the following:

  • _R_CHECK_BASHISMS_
  • _R_CHECK_BROWSER_NONINTERACTIVE_
  • _R_CHECK_CODE_USAGE_VIA_NAMESPACES_
  • _R_CHECK_CODE_USAGE_WITH_ONLY_BASE_ATTACHED_
  • _R_CHECK_CODOC_VARIABLES_IN_USAGES_
  • _R_CHECK_COMPILATION_FLAGS_
  • _R_CHECK_CONNECTIONS_LEFT_OPEN_
  • _R_CHECK_DATALIST_
  • _R_CHECK_DEPENDS_ONLY_DATA_
  • __R_CHECK_DOC_FILES_NOTE_IF_ALL_SPECIAL__
  • _R_CHECK_DOT_FIRSTLIB_
  • _R_CHECK_EXCESSIVE_IMPORTS_
  • _R_CHECK_FF_AS_CRAN_
  • _R_CHECK_FUTURE_FILE_TIMESTAMPS_
  • _R_CHECK_INSTALL_DEPENDS_
  • _R_CHECK_LIMIT_CORES_
  • _R_CHECK_MATRIX_DATA_
  • _R_CHECK_MBCS_CONVERSION_FAILURE_
  • _R_CHECK_NATIVE_ROUTINE_REGISTRATION_
  • _R_CHECK_NEWS_IN_PLAIN_TEXT_
  • _R_CHECK_NO_RECOMMENDED_
  • _R_CHECK_NO_STOP_ON_TEST_ERROR_
  • _R_CHECK_ORPHANED_
  • _R_CHECK_PACKAGE_DATASETS_SUPPRESS_NOTES_
  • _R_CHECK_PACKAGES_USED_CRAN_INCOMING_NOTES_
  • _R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_
  • _R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_
  • _R_CHECK_PRAGMAS_
  • _R_CHECK_RD_CONTENTS_KEYWORDS_
  • _R_CHECK_R_DEPENDS_
  • _R_CHECK_RD_NOTE_LOST_BRACES_
  • _R_CHECK_R_ON_PATH_
  • _R_CHECK_SCREEN_DEVICE_
  • _R_CHECK_SHLIB_OPENMP_FLAGS_
  • _R_CHECK_TIMINGS_
  • _R_CHECK_VALIDATE_UTF8_
  • _R_CHECK_XREFS_MIND_SUSPECT_ANCHORS_
  • _R_CHECK_XREFS_PKGS_ARE_DECLARED_
  • _R_CXX_USE_NO_REMAP_
  • R_DEFAULT_PACKAGES
  • R_ENABLE_JIT
  • _R_NO_S_TYPEDEFS_
  • R_RD4PDF
  • TMPDI

The includes 40 _R_ variables, all of which are _R_CHECK_ (and one __R_CHECK_), three R_ variables, plus one other. Almost all of the _R_-type variables are set within if (as_cran) {...}, suggesting that the function only reliably identifies function calls made within R CMD check --as-cran, and nothing else. This notably excludes reverse dependency checks, which use only R CMD check --timings and not --as-cran. The --timings flag currently doesn't set any envvars, and uses (that is, Sys.getenv()) only two: _R_CHECK_EXAMPLE_TIMING_THRESHOLD_, and _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_. Neither of these are explicitly set anywhere in the R source itself, so must be presumed set beyond that in scripts used on the CRAN machines themselves. Which brings us to ...


Variables set on CRAN machines

The definitive public source for environmental variables set by CRAN seems to be the suite of tests in https://github.com/r-devel/r-dev-web/tree/main/CRAN/QA (I guess "Quality Assurance"?). This is a suite of tests by the 5 main CRAN members, of which 4 contain code for actual test and check suites, as detailed in the following sub-sections.

BDR

This contains sub-directories for many operating systems, with the following summarising Linux-auk only, presuming other systems to be similar:

Kurt

Simon

  • Only 2 _R_CHECK_ variables set in the "nightly build" checks

Uwe


Conclusions

Using environment variables prefixed with _R_ seems to be a reliable way to identify code being executed on CRAN machines, or on R CMD check --as-cran. Minimal number of variables seem to be 14 in some of the BDR tests (although that number may be higher in reality through additional variables being set elsewhere). The only environments within the CRAN/QA sub-directory which contain fewer than that number seems to be the nightly builds, which use only two, and the binary compilation, which uses only one.

It would thus be safe to set n_CRAN_envvars to a default of 10, although this is unlikely to make any difference from currently proposed value of 5. The CRAN_pattern could also be made more explicit through extending from current _R_ to _R_CHECK_, but again this would not make any difference in practice, and the two are identical.

closed by #15