colData change for GrunPancreasData in BioC 3.19; worth making `legacy = TRUE` the default in BioC 3.19
PeteHaitch opened this issue · comments
This specific case is causing grief in OSCA in BioC 3.19.
Is it worth making legacy = TRUE
(for all functions) the default for BioC 3.19 so that we get a full release cycle to sort through discrepancies between legacy and new versions ?
Personally, I've got very little time between now and the release for debugging recent issues with OSCA and my quick look suggests some of the current failures are stemming from these changes in scRNAseq.
My impression is that there have been a few cases of differences between the legacy and non-legacy versions of the datasets - I think @hpages or @vjcitn posted about some, perhaps on Slack, also #45)?
BioC 3.18
suppressPackageStartupMessages(library(scRNAseq))
sce <- GrunPancreasData()
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
colData(sce)
#> DataFrame with 1728 rows and 2 columns
#> donor sample
#> <character> <character>
#> D2ex_1 D2 exocrine fraction, l..
#> D2ex_2 D2 exocrine fraction, l..
#> D2ex_3 D2 exocrine fraction, l..
#> D2ex_4 D2 exocrine fraction, l..
#> D2ex_5 D2 exocrine fraction, l..
#> ... ... ...
#> D17TGFB_92 D17 TGFBR3+ sorted cells
#> D17TGFB_93 D17 TGFBR3+ sorted cells
#> D17TGFB_94 D17 TGFBR3+ sorted cells
#> D17TGFB_95 D17 TGFBR3+ sorted cells
#> D17TGFB_96 D17 TGFBR3+ sorted cells
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.3 (2024-02-29)
#> os Ubuntu 22.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2024-04-11
#> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-5 2016-07-21 [3] CRAN (R 4.2.0)
#> AnnotationDbi 1.64.1 2023-11-03 [1] Bioconductor
#> AnnotationFilter 1.26.0 2023-10-24 [1] Bioconductor
#> AnnotationHub 3.10.1 2024-04-05 [1] Bioconductor 3.18 (R 4.3.3)
#> Biobase * 2.62.0 2023-10-24 [1] Bioconductor
#> BiocFileCache 2.10.2 2024-03-27 [1] Bioconductor 3.18 (R 4.3.3)
#> BiocGenerics * 0.48.1 2023-11-01 [1] Bioconductor
#> BiocIO 1.12.0 2023-10-24 [1] Bioconductor
#> BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.3.1)
#> BiocParallel 1.36.0 2023-10-24 [1] Bioconductor
#> BiocVersion 3.18.1 2023-11-15 [1] Bioconductor
#> biomaRt 2.58.2 2024-01-30 [1] Bioconductor 3.18 (R 4.3.2)
#> Biostrings 2.70.3 2024-03-13 [1] Bioconductor 3.18 (R 4.3.3)
#> bit 4.0.5 2022-11-15 [3] RSPM (R 4.2.0)
#> bit64 4.0.5 2020-08-30 [3] RSPM (R 4.2.0)
#> bitops 1.0-7 2021-04-24 [3] RSPM (R 4.2.0)
#> blob 1.2.4 2023-03-17 [3] RSPM (R 4.2.0)
#> cachem 1.0.8 2023-05-01 [3] RSPM (R 4.2.0)
#> cli 3.6.2 2023-12-11 [3] RSPM (R 4.3.0)
#> codetools 0.2-20 2024-03-31 [3] RSPM (R 4.3.0)
#> crayon 1.5.2 2022-09-29 [3] RSPM (R 4.2.0)
#> curl 5.2.1 2024-03-01 [3] RSPM (R 4.3.0)
#> DBI 1.2.2 2024-02-16 [3] RSPM (R 4.3.0)
#> dbplyr 2.5.0 2024-03-19 [3] RSPM (R 4.3.0)
#> DelayedArray 0.28.0 2023-10-24 [1] Bioconductor
#> digest 0.6.35 2024-03-11 [3] RSPM (R 4.3.0)
#> dplyr 1.1.4 2023-11-17 [3] RSPM (R 4.3.0)
#> ensembldb 2.26.0 2023-10-24 [1] Bioconductor
#> evaluate 0.23 2023-11-01 [3] RSPM (R 4.3.0)
#> ExperimentHub 2.10.0 2023-10-24 [1] Bioconductor
#> fansi 1.0.6 2023-12-08 [3] RSPM (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [3] RSPM (R 4.2.0)
#> filelock 1.0.3 2023-12-11 [3] RSPM (R 4.3.0)
#> fs 1.6.3 2023-07-20 [3] RSPM (R 4.2.0)
#> generics 0.1.3 2022-07-05 [3] RSPM (R 4.2.0)
#> GenomeInfoDb * 1.38.8 2024-03-15 [1] Bioconductor 3.18 (R 4.3.3)
#> GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor
#> GenomicAlignments 1.38.2 2024-01-16 [1] Bioconductor 3.18 (R 4.3.2)
#> GenomicFeatures 1.54.4 2024-03-13 [1] Bioconductor 3.18 (R 4.3.3)
#> GenomicRanges * 1.54.1 2023-10-29 [1] Bioconductor
#> glue 1.7.0 2024-01-09 [3] RSPM (R 4.3.0)
#> hms 1.1.3 2023-03-21 [3] RSPM (R 4.2.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
#> httpuv 1.6.15 2024-03-26 [3] RSPM (R 4.3.0)
#> httr 1.4.7 2023-08-15 [3] RSPM (R 4.2.0)
#> interactiveDisplayBase 1.40.0 2023-10-24 [1] Bioconductor
#> IRanges * 2.36.0 2023-10-24 [1] Bioconductor
#> KEGGREST 1.42.0 2023-10-24 [1] Bioconductor
#> knitr 1.46 2024-04-06 [1] CRAN (R 4.3.3)
#> later 1.3.2 2023-12-06 [3] RSPM (R 4.3.0)
#> lattice 0.22-6 2024-03-20 [3] RSPM (R 4.3.0)
#> lazyeval 0.2.2 2019-03-15 [3] RSPM (R 4.2.0)
#> lifecycle 1.0.4 2023-11-07 [3] RSPM (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [3] RSPM (R 4.2.0)
#> Matrix 1.6-5 2024-01-11 [3] RSPM (R 4.3.0)
#> MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor
#> matrixStats * 1.2.0 2023-12-11 [3] RSPM (R 4.3.0)
#> memoise 2.0.1 2021-11-26 [3] RSPM (R 4.2.0)
#> mime 0.12 2021-09-28 [3] RSPM (R 4.2.0)
#> pillar 1.9.0 2023-03-22 [3] RSPM (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.0.1)
#> png 0.1-8 2022-11-29 [3] RSPM (R 4.2.0)
#> prettyunits 1.2.0 2023-09-24 [3] RSPM (R 4.3.0)
#> progress 1.2.3 2023-12-06 [1] CRAN (R 4.3.2)
#> promises 1.3.0 2024-04-05 [1] CRAN (R 4.3.3)
#> ProtGenerics 1.34.0 2023-10-24 [1] Bioconductor
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)
#> R.cache 0.16.0 2022-07-21 [3] RSPM (R 4.2.0)
#> R.methodsS3 1.8.2 2022-06-13 [3] RSPM (R 4.2.0)
#> R.oo 1.26.0 2024-01-24 [3] RSPM (R 4.3.0)
#> R.utils 2.12.3 2023-11-18 [3] RSPM (R 4.3.0)
#> R6 2.5.1 2021-08-19 [3] RSPM (R 4.2.0)
#> rappdirs 0.3.3 2021-01-31 [3] RSPM (R 4.2.0)
#> Rcpp 1.0.12 2024-01-09 [3] RSPM (R 4.3.0)
#> RCurl 1.98-1.14 2024-01-09 [3] RSPM (R 4.3.0)
#> reprex 2.1.0 2024-01-11 [3] RSPM (R 4.3.0)
#> restfulr 0.0.15 2022-06-16 [3] RSPM (R 4.2.0)
#> rjson 0.2.21 2022-01-09 [3] RSPM (R 4.2.0)
#> rlang 1.1.3 2024-01-10 [3] RSPM (R 4.3.0)
#> rmarkdown 2.26 2024-03-05 [3] RSPM (R 4.3.0)
#> Rsamtools 2.18.0 2023-10-24 [1] Bioconductor
#> RSQLite 2.3.6 2024-03-31 [1] CRAN (R 4.3.3)
#> rtracklayer 1.62.0 2023-10-24 [1] Bioconductor
#> S4Arrays 1.2.1 2024-03-04 [1] Bioconductor 3.18 (R 4.3.3)
#> S4Vectors * 0.40.2 2023-11-23 [1] Bioconductor 3.18 (R 4.3.2)
#> scRNAseq * 2.16.0 2023-10-26 [1] Bioconductor
#> sessioninfo 1.2.2 2021-12-06 [3] RSPM (R 4.2.0)
#> shiny 1.8.1.1 2024-04-02 [3] RSPM (R 4.3.0)
#> SingleCellExperiment * 1.24.0 2023-10-24 [1] Bioconductor
#> SparseArray 1.2.4 2024-02-11 [1] Bioconductor 3.18 (R 4.3.3)
#> stringi 1.8.3 2023-12-11 [3] RSPM (R 4.3.0)
#> stringr 1.5.1 2023-11-14 [3] RSPM (R 4.3.0)
#> styler 1.10.3 2024-04-07 [1] CRAN (R 4.3.3)
#> SummarizedExperiment * 1.32.0 2023-10-24 [1] Bioconductor
#> tibble 3.2.1 2023-03-20 [3] RSPM (R 4.3.0)
#> tidyselect 1.2.1 2024-03-11 [3] RSPM (R 4.3.0)
#> utf8 1.2.4 2023-10-22 [3] RSPM (R 4.3.0)
#> vctrs 0.6.5 2023-12-01 [3] RSPM (R 4.3.0)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
#> xfun 0.43 2024-03-25 [1] CRAN (R 4.3.3)
#> XML 3.99-0.16.1 2024-01-22 [3] RSPM (R 4.3.0)
#> xml2 1.3.6 2023-12-04 [3] RSPM (R 4.3.0)
#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
#> XVector 0.42.0 2023-10-24 [1] Bioconductor
#> yaml 2.3.8 2023-12-11 [3] RSPM (R 4.3.0)
#> zlibbioc 1.48.2 2024-03-13 [1] Bioconductor 3.18 (R 4.3.3)
#>
#> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.3
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
BioC 3.19
suppressPackageStartupMessages(library(scRNAseq))
sce <- GrunPancreasData()
# 'treatment' rather than 'sample'
colData(sce)
#> DataFrame with 1728 rows and 2 columns
#> donor treatment
#> <character> <character>
#> D2ex_1 D2 exocrine fraction, l..
#> D2ex_2 D2 exocrine fraction, l..
#> D2ex_3 D2 exocrine fraction, l..
#> D2ex_4 D2 exocrine fraction, l..
#> D2ex_5 D2 exocrine fraction, l..
#> ... ... ...
#> D17TGFB_92 D17 TGFBR3+ sorted cells
#> D17TGFB_93 D17 TGFBR3+ sorted cells
#> D17TGFB_94 D17 TGFBR3+ sorted cells
#> D17TGFB_95 D17 TGFBR3+ sorted cells
#> D17TGFB_96 D17 TGFBR3+ sorted cells
sce_legacy <- GrunPancreasData(legacy = TRUE)
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
colData(sce_legacy)
#> DataFrame with 1728 rows and 2 columns
#> donor sample
#> <character> <character>
#> D2ex_1 D2 exocrine fraction, l..
#> D2ex_2 D2 exocrine fraction, l..
#> D2ex_3 D2 exocrine fraction, l..
#> D2ex_4 D2 exocrine fraction, l..
#> D2ex_5 D2 exocrine fraction, l..
#> ... ... ...
#> D17TGFB_92 D17 TGFBR3+ sorted cells
#> D17TGFB_93 D17 TGFBR3+ sorted cells
#> D17TGFB_94 D17 TGFBR3+ sorted cells
#> D17TGFB_95 D17 TGFBR3+ sorted cells
#> D17TGFB_96 D17 TGFBR3+ sorted cells
Session info
sessionInfo()
#> R version 4.4.0 alpha (2024-04-02 r86304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/peter/Downloads/R-alpha/lib/libRblas.so
#> LAPACK: /home/peter/Downloads/R-alpha/lib/libRlapack.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
#> [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
#> [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Australia/Melbourne
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scRNAseq_2.17.7 SingleCellExperiment_1.25.1
#> [3] SummarizedExperiment_1.33.3 Biobase_2.63.1
#> [5] GenomicRanges_1.55.4 GenomeInfoDb_1.39.13
#> [7] IRanges_2.37.1 S4Vectors_0.41.6
#> [9] BiocGenerics_0.49.1 MatrixGenerics_1.15.0
#> [11] matrixStats_1.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.2 bitops_1.0-7 httr2_1.0.1
#> [4] rlang_1.1.3 magrittr_2.0.3 gypsum_0.99.15
#> [7] compiler_4.4.0 RSQLite_2.3.6 GenomicFeatures_1.55.4
#> [10] png_0.1-8 vctrs_0.6.5 ProtGenerics_1.35.4
#> [13] pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1
#> [16] dbplyr_2.5.0 XVector_0.43.1 utf8_1.2.4
#> [19] Rsamtools_2.19.4 rmarkdown_2.26 UCSC.utils_0.99.5
#> [22] purrr_1.0.2 bit_4.0.5 xfun_0.43
#> [25] reprex_2.1.0 aws.s3_0.3.21 zlibbioc_1.49.3
#> [28] cachem_1.0.8 jsonlite_1.8.8 blob_1.2.4
#> [31] rhdf5filters_1.15.4 DelayedArray_0.29.9 Rhdf5lib_1.25.3
#> [34] BiocParallel_1.37.1 parallel_4.4.0 R6_2.5.1
#> [37] rtracklayer_1.63.2 Rcpp_1.0.12 knitr_1.46
#> [40] base64enc_0.1-3 Matrix_1.7-0 tidyselect_1.2.1
#> [43] abind_1.4-5 yaml_2.3.8 codetools_0.2-20
#> [46] curl_5.2.1 lattice_0.22-6 alabaster.sce_1.3.3
#> [49] tibble_3.2.1 withr_3.0.0 KEGGREST_1.43.0
#> [52] evaluate_0.23 BiocFileCache_2.11.2 alabaster.schemas_1.3.1
#> [55] xml2_1.3.6 ExperimentHub_2.11.1 Biostrings_2.71.5
#> [58] pillar_1.9.0 BiocManager_1.30.22 filelock_1.0.3
#> [61] generics_0.1.3 RCurl_1.98-1.14 BiocVersion_3.19.1
#> [64] ensembldb_2.27.1 alabaster.base_1.3.23 alabaster.ranges_1.3.3
#> [67] glue_1.7.0 alabaster.matrix_1.3.13 lazyeval_0.2.2
#> [70] tools_4.4.0 AnnotationHub_3.11.4 BiocIO_1.13.0
#> [73] GenomicAlignments_1.39.5 fs_1.6.3 XML_3.99-0.16.1
#> [76] rhdf5_2.47.6 grid_4.4.0 AnnotationDbi_1.65.2
#> [79] GenomeInfoDbData_1.2.12 HDF5Array_1.31.6 restfulr_0.0.15
#> [82] cli_3.6.2 rappdirs_0.3.3 fansi_1.0.6
#> [85] S4Arrays_1.3.7 dplyr_1.1.4 AnnotationFilter_1.27.0
#> [88] alabaster.se_1.3.4 digest_0.6.35 SparseArray_1.3.5
#> [91] rjson_0.2.21 memoise_2.0.1 htmltools_0.5.8.1
#> [94] lifecycle_1.0.4 httr_1.4.7 mime_0.12
#> [97] aws.signature_0.6.0 bit64_4.0.5
Oops. The GrunPancreasData
issue was an error on my part; this should be fixed in the next scRNAseq version.
For the current BBS failure: this is less of my fault. Where possible, the new datasets were generated fresh from their primary sources (i.e., GEO, ArrayExpress). In this case, ArrayExpress decided to rename the individuals - in particular, AZ
is now called H1
- causing this line to not filter out the offending individual. Change the filter condition from AZ
to H1
and everything works again. (Okay, not completely back-compatible, but I'm just following ArrayExpress here.)
I just ran through all the workflows in OSCA.workflows that use scRNAseq, and the only other failure is in nestorowa-hsc.Rmd
where I moved the FACS data from the colData
to its own altExp
- sure, a breaking change for anyone who was using the FACS data, but a more sensible long-term place for it to live. The corresponding patch here should be:
Y <- assay(altExp(sce.nest, "FACS"))
keep <- colSums(is.na(Y))==0 # Removing NA intensities.
se.averaged <- sumCountsAcrossCells(Y[,keep],
colLabels(sce.nest)[keep], average=TRUE)
Seems like we're almost there, at least for OSCA.workflows. I suppose we could just slap legacy=TRUE
on the scRNAseq calls for any workflows that still don't work. A bit unsatisfying but oh well.
Closing as now addressed in OSCA.workflows.