lockfile contains more than just packages which appear in my project
johnForne opened this issue · comments
Kia ora
I am new to renv:: and think this sounds like it could be really useful for helping make our code more reproducible.
However, I've run into an issue that I don't know whether it is me or the code...
The situation is that I have an existing project that I've run 'renv::init()' within and I then checked the lockfile to see what packages it listed. The issue is that it seems to list every package that I've ever used in R - rather than only the packages within the project.
renv reference material suggests that
(The default) Capture only packages which appear to be used in your project, as determined by renv::dependencies(). This ensures that only the packages actually required by your project will enter the lockfile; the downside if it might be slow if your project contains a large number of files. If speed becomes an issue, you might consider using .renvignore files to limit which files renv uses for dependency discovery, or switching to explicit mode, as described next.
I tested this by calling 'renv::dependences()' and found that it listed the 43 packages I expected with my project. In contrast, 'renv::lockfile_read()' lists 254 packages!
library(renv)
library(tidyverse)
init()
d <- dependencies()
d %>%
distinct(Package) %>%
arrange(Package)
> d %>%
+ distinct(Package) %>%
+ arrange(Package)
Package
1 DT
2 GMS
3 Polychrome
4 RColorBrewer
5 aws.s3
6 base
7 brms
8 dggridR
9 dplyr
10 er.helpers
11 er.templates
12 forcats
13 ggplot2
14 glue
15 haven
16 janitor
17 knitr
18 leaflet
19 magrittr
20 pals
21 plotly
22 purrr
23 rcartocolor
24 readr
25 readxl
26 renv
27 rlang
28 rmarkdown
29 rsconnect
30 scales
31 sf
32 sfdep
33 shiny
34 shinyWidgets
35 shinycssloaders
36 simplevis
37 snakecase
38 stringr
39 tidyr
40 tidyverse
41 viridis
42 wesanderson
43 zip
>
l <- lockfile_read()
l$Packages %>%
names()
<html>
<body>
<!--StartFragment-->
> l$Packages %>% + names()
[1] "BH" "Brobdingnag" "DBI" "DT" "GMS"
[6] "KernSmooth" "MASS" "Matrix" "Polychrome" "QuickJSR"
[11] "R6" "RColorBrewer" "Rcpp" "RcppEigen" "RcppParallel"
[16] "StanHeaders" "abind" "anytime" "askpass" "aws.s3"
[21] "aws.signature" "backports" "base64enc" "bayesplot" "bit"
[26] "bit64" "blob" "boot" "brew" "bridgesampling"
[31] "brio" "brms" "broom" "bslib" "cachem"
[36] "callr" "cellranger" "checkmate" "class" "classInt"
[41] "cli" "clipr" "coda" "codetools" "colorspace"
[46] "colourpicker" "commonmark" "conflicted" "cpp11" "crayon"
[51] "credentials" "crosstalk" "curl" "data.table" "dbplyr"
[56] "deldir" "desc" "devtools" "dggridR" "dichromat"
[61] "diffobj" "digest" "distributional" "downlit" "dplyr"
[66] "dtplyr" "dygraphs" "e1071" "ellipsis" "er.helpers"
[71] "er.templates" "evaluate" "extraDistr" "fansi" "farver"
[76] "fastmap" "fontawesome" "forcats" "fs" "future"
[81] "gargle" "generics" "geojsonsf" "geometries" "get"
[86] "ggplot2" "ggridges" "gh" "git2r" "gitcreds"
[91] "globals" "glue" "googledrive" "googlesheets4" "gridExtra"
[96] "gtable" "gtools" "haven" "highr" "hms"
[101] "htmltools" "htmlwidgets" "httpuv" "httr" "httr2"
[106] "ids" "igraph" "ini" "inline" "isoband"
[111] "janitor" "jquerylib" "jsonify" "jsonlite" "knitr"
[116] "labeling" "later" "lattice" "lazyeval" "leafem"
[121] "leaflet" "leaflet.providers" "leafpop" "lifecycle" "listenv"
[126] "loo" "lubridate" "lwgeom" "magrittr" "mapproj"
[131] "maps" "markdown" "matrixStats" "memoise" "mgcv"
[136] "mime" "miniUI" "modelr" "munsell" "mvtnorm"
[141] "networkD3" "nleqslv" "nlme" "numDeriv" "odbc"
[146] "openssl" "packrat" "pals" "parallelly" "pillar"
[151] "pkgbuild" "pkgconfig" "pkgdown" "pkgload" "plotly"
[156] "plyr" "png" "posterior" "praise" "prettyunits"
[161] "processx" "profvis" "progress" "promises" "proxy"
[166] "ps" "purrr" "ragg" "rapidjsonr" "rappdirs"
[171] "raster" "rcartocolor" "rcmdcheck" "readr" "readxl"
[176] "rematch" "rematch2" "remotes" "renv" "reprex"
[181] "reshape2" "rgeos" "rlang" "rmarkdown" "roxygen2"
[186] "rprojroot" "rsconnect" "rstan" "rstantools" "rstudioapi"
[191] "rversions" "rvest" "s2" "sass" "scales"
[196] "scatterplot3d" "selectr" "sessioninfo" "sf" "sfdep"
[201] "sfheaders" "shiny" "shinyWidgets" "shinycssloaders" "shinyjs"
[206] "shinystan" "shinythemes" "simplevis" "snakecase" "sourcetools"
[211] "sp" "spData" "spdep" "stars" "string"
[216] "stringr" "svglite" "sys" "systemfonts" "tensorA"
[221] "terra" "testthat" "textshaping" "threejs" "tibble"
[226] "tidyr" "tidyselect" "tidyverse" "timechange" "tinytex"
[231] "trend" "tzdb" "units" "urlchecker" "usethis"
[236] "utf8" "uuid" "vctrs" "viridis" "viridisLite"
[241] "vroom" "waldo" "wesanderson" "whisker" "withr"
[246] "wk" "xfun" "xml2" "xopen" "xtable"
[251] "xts" "yaml" "zip" "zoo"
--
|
<br class="Apple-interchange-newline"><!--EndFragment-->
</body>
</html>
Interestingly, I then tested what happened if I set up a brand new project with only one 'test.R' file in it with the following code...
This time I found that 'd' had the two packages (renv:: + tidyverse::) that I expected. However, the lockfile seemed to contain all sorts of packages (108) that were more than the 31 packages in tidyverse + the 1 renv package.
library(renv)
library(tidyverse)
renv::init()
library(renv)
library(tidyverse)
d <- dependencies()
d %>%
distinct(Package) %>%
arrange(Package)
l <- lockfile_read()
l$Packages %>%
names()
tidyverse_packages()
Can you please let me know how to actually "Capture only packages which appear to be used in your project"?
Thanks in advance,
John
The lockfile captures both the top-level package dependencies, as well as those package's recursive dependencies. Could that explain why? For example, the tidyverse
package has a large number of recursive dependencies:
> tools::package_dependencies("tidyverse", recursive = TRUE)[[1]]
[1] "broom" "conflicted" "cli" "dbplyr"
[5] "dplyr" "dtplyr" "forcats" "ggplot2"
[9] "googledrive" "googlesheets4" "haven" "hms"
[13] "httr" "jsonlite" "lubridate" "magrittr"
[17] "modelr" "pillar" "purrr" "ragg"
[21] "readr" "readxl" "reprex" "rlang"
[25] "rstudioapi" "rvest" "stringr" "tibble"
[29] "tidyr" "xml2" "backports" "ellipsis"
[33] "generics" "glue" "lifecycle" "utils"
[37] "memoise" "blob" "DBI" "methods"
[41] "R6" "tidyselect" "vctrs" "withr"
[45] "data.table" "grDevices" "grid" "gtable"
[49] "isoband" "MASS" "mgcv" "scales"
[53] "stats" "gargle" "uuid" "cellranger"
[57] "curl" "ids" "rematch2" "cpp11"
[61] "pkgconfig" "mime" "openssl" "timechange"
[65] "fansi" "utf8" "systemfonts" "textshaping"
[69] "clipr" "crayon" "vroom" "tzdb"
[73] "progress" "callr" "fs" "knitr"
[77] "rmarkdown" "selectr" "stringi" "processx"
[81] "rematch" "rappdirs" "evaluate" "highr"
[85] "tools" "xfun" "yaml" "graphics"
[89] "cachem" "nlme" "Matrix" "splines"
[93] "askpass" "prettyunits" "bslib" "fontawesome"
[97] "htmltools" "jquerylib" "tinytex" "farver"
[101] "labeling" "munsell" "RColorBrewer" "viridisLite"
[105] "bit64" "sys" "bit" "base64enc"
[109] "sass" "fastmap" "digest" "lattice"
[113] "colorspace" "ps"
Thanks Kevin - much appreciated.
That's good to know. I wonder if it is possible/makes sense to have an optional argument to limit the lock file to include (direct) dependences only? Like what 'dependencies()' returns?
But that would give you an incomplete lockfile -- if you tried to call renv::restore()
, we wouldn't know what versions of those package's dependencies you need, and so you'd risk issues due to a change in the R library state.
I wonder if it is possible/makes sense to have an optional argument to limit the lock file to include (direct) dependences only? Like what 'dependencies()' returns?
I think this would be useful from my purposes. I find the inclusion of recursive dependencies to be distracting and sometimes leads to dependency conflicts.
But that would give you an incomplete lockfile -- if you tried to call
renv::restore()
, we wouldn't know what versions of those package's dependencies you need, and so you'd risk issues due to a change in the R library state.
Is there a way to just rely on the underlying package dependency specifications to identify these versions? Coming from Python, you can just add the main packages you need installed to the requirements.txt (or environment.yml if using conda), and pip will automatically install the dependencies of those packages as needed.
For example, if I have a project that requires pandas 2.2.1, pip/conda will also install numpy 1.26.4 as a dependency (based on pandas specifying numpy<2 in its environment.yml) without needing to pin that version of numpy to the dependency specs.
I'd find it easier to just manage these direct dependencies, but having less familiarity with R than Python's ecosystem, I might be missing the mark here.