eddelbuettel / r2u

CRAN as Ubuntu Binaries

Home Page:https://eddelbuettel.github.io/r2u

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

guidelines on using newer r2u simultaneously with older CRAN packages?

MatthieuStigler opened this issue · comments

Dirk thanks for this great package, works very well!

Could you please share your advice on using simultaneously r2u and standard install.packages() commands without bspm? Currently I have both, and as "~/R/x86_64-pc-linux-gnu-library/4.2" is first in .libPaths() is first, R might uses older versions in "~/R/x86_64-pc-linux-gnu-library/4.2" rather than newer installed by r2u.

So my questions are:

  • If one sticks with using a mix of install.packages() without bspm and r2u, do you recommend changing `.libPaths()?
  • If one decides instead to use bspm, I guess one will need to remove all packages in "~/R/x86_64-pc-linux-gnu-library/4.2" and even in "/usr/local/lib/R/site-library", as older versions there would take precedence over r2u packages? Is my understanding correct that bpsm will have install.packages() in /usr/lib/R/site-library, but remove.packages("xxx") under bpsm is left unaffected and will remove from the first repo? Does this mean I should look for all packages in either "~/R/x86_64-pc-linux-gnu-library/4.2" and "/usr/local/lib/R/site-library", check those installed by install.packages (i.e. exclude the github ones), remove them, then install again?

Thanks!!

It's a good and fair question. For R, .libPaths() order wins. And for example the Debian and Ubuntu default even without a path below $HOME is to have /usr/local before the path that r2u installs too:

> .libPaths()
[1] "/usr/local/lib/R/site-library"
[2] "/usr/lib/R/site-library"               # r2u installation
[3] "/usr/lib/R/library"           
> 

I have this situation on my laptop where a number of packages in /usr/local/ shadow the (potentially newer) ones from r2u, and I have been meaning to at least write an extended version of available.packages() to flag and warn about shadowed packages.

Ultimately, it is a local sysadmin question. You (for your machine) and I (for my laptop) decided to use a large number of (now system!!) package with r2u so maybe on that machine we need to override the order in .libPaths() in Rprofile.site and/or our user ~/.Rprofile. It is a pretty new problem to have thanks to r2u.

ok, this is actually more complicated than I thought! I just installed on our server, and then users started having error messages like:

library(tidyverse)
Error: package or namespace load failed for 'tidyverse' in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace 'rlang' 1.0.2 is already loaded, but >= 1.0.6 is required

which comes because of the shadowing issue:

subset(installed.packages()|> as.data.frame(), Package=="rlang")[,1:5]
Package LibPath Version Priority Depends
rlang rlang /usr/local/lib/R/site-library 1.0.2 R (>= 3.4.0)
rlang.1 rlang /usr/lib/R/site-library 1.0.6 R (>= 3.4.0)

I tried doing echo "bspm::disable()" | sudo tee -a /etc/R/Rprofile.site but users still had conflicts.

So at that point, I can think of two approaches:

  • change .libPaths() order, and remove the two first. This means though users cannot have their own github installed pacakges?
  • seek to isolate package shadowing packages and remove them?

Do you have a sense on which one makes more sense?

Also, two side questions if I may:

  • on Ubuntu server 20.04, which component will install in /usr/local/lib/R/site-library?
  • Is bspm also affecting remove.packages()? Strangely enough, one user was not able to remove a pkg from there, doing:
> remove.packages("rlang", lib = "/usr/local/lib/R/site-library")
> subset(installed.packages()|> as.data.frame(), Package=="rlang")[,1:5]
        Package                       LibPath Version Priority      Depends
rlang     rlang /usr/local/lib/R/site-library   1.0.2     <NA> R (>= 3.4.0)
rlang.1   rlang       /usr/lib/R/site-library   1.0.6     <NA> R (>= 3.4.0)

That is a somewhat different local issue / the same issue as we now have easier choice to have current packages later in the paths.

But hey if you have many users with a potpourri of R versions and installations and you cannot or do not want to deal with .libPaths() resorting then maybe r2u is not for you. Or not until "we all" figure out how to sort, or to maybe add 'pinning' to library, or ... It's a new topic. It's good to brain storm.

As for removal, consider relying just on apt for system packages. I need to check with @Enchufa2 what the best policy was, he had a good point on that too but I don't want to quote him without checking.

By a (decades long !!) convention on Linux systems with package managers. /usr/local/ is outside of apt or dpkg (and, I believe, other distros do the same). So you only get to /usr/local/lib/R/site-package by calling R directly. Which is what I have done for 25+ years with R on Linux. I personally also do NOT use a library below ~ so for me that is the default. But it can now be behind where r2u installs so we will work on some tooling. Maybe diagnostics first.

But recall that everything actually works as documented and expected. It is "merely" creating a new situation for us.

This means though users cannot have their own github installed pacakges?

False. Users still do whatever they want however they want. But by not paying attention thay can also keep a local (possibly outdated!) package ahead of a system-installed newer one.

But that is no different than ~/R/*/* shadowing /usr/local. We always had this problem as soon as we had several directories in .libPaths(). In practice it was less of an issue because we had fewer reasons to install (many!!) packages later in the path. r2u changes that. That is still a good thing but we will need to work out best practices.

Could you please share your advice on using simultaneously r2u and standard install.packages() commands without bspm?

TL;DR, do not do that. :) Could you please share what is your goal here? Depending on this, I bet there are better ways to achieve it.

bspm was designed to leave it always enabled. You want a GitHub package not available on CRAN? Call remotes, and you'll get it in your user folder, no issue. You may have your user path (~/R/x86_64-pc-linux-gnu-library/4.2 in this case) full of non-CRAN packages, and everything else from r2u via bspm, and everything just works. I've been using this (well, the Fedora version of this) in my own computer and several servers I manage for... 3 years now with no issues.

Issues arise when you start mixing old and new versions. But this is going to happen with or without r2u/bspm, because this is how R/CRAN works. And r2u/bspm is just a new better source of package installations, not a new way to manage packages. If you really need specific versions of packages for certain things, then maybe install.packages is not for you, and you should resort to locking versions with something like renv (which, BTW, changes .libPaths() to operate properly).

Now, what if you don't really need old versions of packages? What if you are just trying to introduce r2u in a system that already has users with their user dirs full of packages? I'm of course guessing here. But if this is the case, then you need to 1) collect all the packages that your users have, 2) install them from r2u, 3) remove them from their user dirs, and 4) enjoy!

Does this mean I should look for all packages in either "~/R/x86_64-pc-linux-gnu-library/4.2" and "/usr/local/lib/R/site-library", check those installed by install.packages (i.e. exclude the github ones), remove them, then install again?

Exactly, this is what I was saying. You can just collect all the packages, call bspm::install_sys(pkgs), and whatever this function returns, they're non-CRAN packages (so do not remove them, remove the others).

Is bspm also affecting remove.packages()?

No, see cran4linux/bspm#43 (comment)

thanks @Enchufa2 !

Ok I see what you mean, if I can scan all previous packages installed by all users on the server, then I can install them myself into /usr/lib/R/site-library, then remove the packages in the user directory. I'll have to think about this, not sure yet how to remove on users' folders, might need to ask them all to do it or do it myself as sudo. 🤔

@MatthieuStigler I would start by writing a script that does this for the current user. Then you could ask your users to execute it, or you could sudo su - <user> for every user and run it yourself.

I would be interested in providing a function and a deployment script in bspm to facilitate this task, so I've opened the issue above in the bspm repo. It would help a lot if you could share your experience/code/issues with this task over there.

Here is a (very, five-minute) first pass at a sketch to find 'shadowed' packages. It is more general than r2u or bspm -- it really applies everywhere where length(.libPaths()) > 1 is true.

shadowedPackages <- function() {
    if (!requireNamespace("data.table", quietly=TRUE)) {
        message("Please install data.table")
        return(invisible())
    }
    require(data.table)
    ip <- installed.packages()
    d <- data.table(ip[,1:3])
    d[, Version:=as.package_version(Version)]
    d[,n:=.N,keyby=Package]
    d[n>1, good:=Version==max(Version), by=Package][n>1,]
}

On my laptop which lives off r2u, I find a few packages shadowing the binaries, mostly one I have worked on myself.

> shadowedPackages()

Key: <Package>
       Package                       LibPath           Version     n   good
        <char>                        <char> <package_version> <int> <lgcl>
 1:       Rcpp /usr/local/lib/R/site-library           1.0.9.1     2  FALSE
 2:       Rcpp       /usr/lib/R/site-library            1.0.10     2   TRUE
 3:    RcppAPT /usr/local/lib/R/site-library             0.0.9     2   TRUE
 4:    RcppAPT       /usr/lib/R/site-library             0.0.9     2   TRUE
 5:       bspm /usr/local/lib/R/site-library           0.4.0.1     2  FALSE
 6:       bspm       /usr/lib/R/site-library             0.4.2     2   TRUE
 7:       dang /usr/local/lib/R/site-library            0.0.15     2   TRUE
 8:       dang       /usr/lib/R/site-library            0.0.15     2   TRUE
 9: data.table /usr/local/lib/R/site-library            1.14.7     2   TRUE
10: data.table       /usr/lib/R/site-library            1.14.6     2  FALSE
11:    littler /usr/local/lib/R/site-library          0.3.15.2     2  FALSE
12:    littler       /usr/lib/R/site-library            0.3.17     2   TRUE
13:     tiledb /usr/local/lib/R/site-library          0.16.0.2     2  FALSE
14:     tiledb       /usr/lib/R/site-library            0.18.0     2   TRUE
> 

It's prettier as a screenshot as I am such a fan of both colorout and the theme I use :)

image

The shadowPackages() function is now in the GitHub repo of CRAN package dang. As the underlying issue was always more of generic R problem of how to align multiple directories with a .libPaths() and therefore not all that specific to this repo, I am going to close it.

Big thank you for raising the issue though -- as it is now addressed in both bspm (for real) and dang (very lightly as shown above) with helper code,

Thanks a lot Dirk, this is very much appreciated!

Two quick points:

  • Function is shadowedPackages() right? You wrote above shadowPackages ;-)
  • Function is not necessarily fail-safe? If d is empty, returns Error in do.call(rbind, d) : second argument must be a list instead of 0-row 4-col data frame ?

Also, for some reason, doing it the dplyr way, I get one more shadowed package, not sure why?

library(dplyr, warn.conflicts = FALSE)


shd <- dang::shadowedPackages() %>% as_tibble()
head(shd)
#> # A tibble: 6 × 4
#>   Package LibPath                                          Version    Latest
#>   <chr>   <chr>                                            <pckg_vrs> <lgl> 
#> 1 bspm    /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 0.4.2.1    TRUE  
#> 2 bspm    /usr/local/lib/R/site-library                    0.4.2      FALSE 
#> 3 dang    /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 0.0.15     TRUE  
#> 4 dang    /usr/lib/R/site-library                          0.0.15     TRUE  
#> 5 effects /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 4.2.3      TRUE  
#> 6 effects /usr/lib/R/site-library                          4.2.2      FALSE

ins <- installed.packages()|> as.data.frame() %>% as_tibble()
ins %>% 
  add_count(Package) %>% 
  filter(n>1) %>% 
  arrange(Package) %>% 
  select(Package, LibPath, Version)
#> # A tibble: 8 × 3
#>   Package LibPath                                          Version
#>   <chr>   <chr>                                            <chr>  
#> 1 bspm    /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 0.4.2.1
#> 2 bspm    /usr/local/lib/R/site-library                    0.4.2  
#> 3 dang    /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 0.0.15 
#> 4 dang    /usr/lib/R/site-library                          0.0.15 
#> 5 effects /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 4.2-3  
#> 6 effects /usr/lib/R/site-library                          4.2-2  
#> 7 tsDyn   /usr/local/lib/R/site-library                    11.0.2 
#> 8 tsDyn   /usr/lib/R/site-library                          11.0.4

Created on 2023-02-13 with reprex v2.0.2

also, is there any chance that you increment the github package version? I was using the function together with bspm::moveto_sys and following lines will fail as bspm::moveto_sys is going to remove it:

Thanks!

dang::shadowedPackages()
bspm::moveto_sys() # will remove dang as has same version as CRAN
dang::shadowedPackages()

Yes I generally roll the minor version (and should) and yes I meant shadowedPackages().

I would need to see your installed.packages() three columns to see about the missing package. Also, see inside the short function and maybe for kicks flip what is commented out with what is still there so try the data.table variant. The base R one was a Sunday afternoon 'Code Golf' exercise with @vincentarelbundock. Lastly, your dplyr variant needs a mutate to add which package is the 'max' version package.

@MatthieuStigler :

I would need to see your installed.packages() three columns to see about the missing package.

I never heard back from you. Anyway, shadowed.packages() is back to data.table(), and I rolled the minor version as usual. Feedback still welcome.

@eddelbuettel sorry about that! I actually re-ran the function, and the package then appeared! So wasn't an issue after all, the function seems to be working well!

To summarize the post and for future discoverability, would it be fair to say that there are at least two potential solutions to shadowed packages:

  • change the order of .libPaths()
  • install with bspm and remove all packages in .libPaths() before /usr/lib/R/site-library, using bspm::moveto_sys()

Thanks!

Sigh. You could have told me...

I am not so syre there is a generic or general solution to your "problem" we can or should "prescribe". .libPaths() has several entries, both install.packages() and library() allow you to set directories (that is likely how renv and packrat and groundhog and whatnot work) -- so all of this is an R feature. You as local sys admin should devise a policy.

shadowedPackages() allows you to identify packages that are shadowed. bspm added several tools to help with and automate migration for users or system wide. How to deploy them will likely depend on your circumstances.

good, thanks for the summary!

And sorry again for not letting you know about that. To redeem myself, I did some checks, and actually it seems there is an issue with data table with the latest version?

devtools::install_github("eddelbuettel/dang")
#> Skipping install of 'dang' from a github remote, the SHA1 (d391ca48) has not changed since last install.
#>   Use `force = TRUE` to force installation
packageVersion("dang")
#> [1] '0.0.15.1'
library(dang)
shd <- dang::shadowedPackages() 
#> Loading required package: data.table
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dang':
#> 
#>     as.data.table, wday
#> Error in `:=`(Version, as.package_version(Version)): Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
head(shd)
#> Error in head(shd): object 'shd' not found

Created on 2023-02-14 with reprex v2.0.2

Please try now, as I am not forcing data.table in it needed a .datatable.aware <- TRUE to ensure [ dispatches right.

it's working now!