LTLA / scRNAseq

Clone of the Bioconductor repository for the scRNAseq package.

Home Page:http://bioconductor.org/packages/devel/data/experiment/html/scRNAseq.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

colData change for GrunPancreasData in BioC 3.19; worth making `legacy = TRUE` the default in BioC 3.19

PeteHaitch opened this issue · comments

This specific case is causing grief in OSCA in BioC 3.19.

Is it worth making legacy = TRUE (for all functions) the default for BioC 3.19 so that we get a full release cycle to sort through discrepancies between legacy and new versions ?
Personally, I've got very little time between now and the release for debugging recent issues with OSCA and my quick look suggests some of the current failures are stemming from these changes in scRNAseq.
My impression is that there have been a few cases of differences between the legacy and non-legacy versions of the datasets - I think @hpages or @vjcitn posted about some, perhaps on Slack, also #45)?

BioC 3.18

suppressPackageStartupMessages(library(scRNAseq))
sce <- GrunPancreasData()
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
colData(sce)
#> DataFrame with 1728 rows and 2 columns
#>                  donor                 sample
#>            <character>            <character>
#> D2ex_1              D2 exocrine fraction, l..
#> D2ex_2              D2 exocrine fraction, l..
#> D2ex_3              D2 exocrine fraction, l..
#> D2ex_4              D2 exocrine fraction, l..
#> D2ex_5              D2 exocrine fraction, l..
#> ...                ...                    ...
#> D17TGFB_92         D17   TGFBR3+ sorted cells
#> D17TGFB_93         D17   TGFBR3+ sorted cells
#> D17TGFB_94         D17   TGFBR3+ sorted cells
#> D17TGFB_95         D17   TGFBR3+ sorted cells
#> D17TGFB_96         D17   TGFBR3+ sorted cells
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_AU:en
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Melbourne
#>  date     2024-04-11
#>  pandoc   2.9.2.1 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                * version     date (UTC) lib source
#>  abind                    1.4-5       2016-07-21 [3] CRAN (R 4.2.0)
#>  AnnotationDbi            1.64.1      2023-11-03 [1] Bioconductor
#>  AnnotationFilter         1.26.0      2023-10-24 [1] Bioconductor
#>  AnnotationHub            3.10.1      2024-04-05 [1] Bioconductor 3.18 (R 4.3.3)
#>  Biobase                * 2.62.0      2023-10-24 [1] Bioconductor
#>  BiocFileCache            2.10.2      2024-03-27 [1] Bioconductor 3.18 (R 4.3.3)
#>  BiocGenerics           * 0.48.1      2023-11-01 [1] Bioconductor
#>  BiocIO                   1.12.0      2023-10-24 [1] Bioconductor
#>  BiocManager              1.30.22     2023-08-08 [1] CRAN (R 4.3.1)
#>  BiocParallel             1.36.0      2023-10-24 [1] Bioconductor
#>  BiocVersion              3.18.1      2023-11-15 [1] Bioconductor
#>  biomaRt                  2.58.2      2024-01-30 [1] Bioconductor 3.18 (R 4.3.2)
#>  Biostrings               2.70.3      2024-03-13 [1] Bioconductor 3.18 (R 4.3.3)
#>  bit                      4.0.5       2022-11-15 [3] RSPM (R 4.2.0)
#>  bit64                    4.0.5       2020-08-30 [3] RSPM (R 4.2.0)
#>  bitops                   1.0-7       2021-04-24 [3] RSPM (R 4.2.0)
#>  blob                     1.2.4       2023-03-17 [3] RSPM (R 4.2.0)
#>  cachem                   1.0.8       2023-05-01 [3] RSPM (R 4.2.0)
#>  cli                      3.6.2       2023-12-11 [3] RSPM (R 4.3.0)
#>  codetools                0.2-20      2024-03-31 [3] RSPM (R 4.3.0)
#>  crayon                   1.5.2       2022-09-29 [3] RSPM (R 4.2.0)
#>  curl                     5.2.1       2024-03-01 [3] RSPM (R 4.3.0)
#>  DBI                      1.2.2       2024-02-16 [3] RSPM (R 4.3.0)
#>  dbplyr                   2.5.0       2024-03-19 [3] RSPM (R 4.3.0)
#>  DelayedArray             0.28.0      2023-10-24 [1] Bioconductor
#>  digest                   0.6.35      2024-03-11 [3] RSPM (R 4.3.0)
#>  dplyr                    1.1.4       2023-11-17 [3] RSPM (R 4.3.0)
#>  ensembldb                2.26.0      2023-10-24 [1] Bioconductor
#>  evaluate                 0.23        2023-11-01 [3] RSPM (R 4.3.0)
#>  ExperimentHub            2.10.0      2023-10-24 [1] Bioconductor
#>  fansi                    1.0.6       2023-12-08 [3] RSPM (R 4.3.0)
#>  fastmap                  1.1.1       2023-02-24 [3] RSPM (R 4.2.0)
#>  filelock                 1.0.3       2023-12-11 [3] RSPM (R 4.3.0)
#>  fs                       1.6.3       2023-07-20 [3] RSPM (R 4.2.0)
#>  generics                 0.1.3       2022-07-05 [3] RSPM (R 4.2.0)
#>  GenomeInfoDb           * 1.38.8      2024-03-15 [1] Bioconductor 3.18 (R 4.3.3)
#>  GenomeInfoDbData         1.2.11      2023-10-30 [1] Bioconductor
#>  GenomicAlignments        1.38.2      2024-01-16 [1] Bioconductor 3.18 (R 4.3.2)
#>  GenomicFeatures          1.54.4      2024-03-13 [1] Bioconductor 3.18 (R 4.3.3)
#>  GenomicRanges          * 1.54.1      2023-10-29 [1] Bioconductor
#>  glue                     1.7.0       2024-01-09 [3] RSPM (R 4.3.0)
#>  hms                      1.1.3       2023-03-21 [3] RSPM (R 4.2.0)
#>  htmltools                0.5.8.1     2024-04-04 [1] CRAN (R 4.3.3)
#>  httpuv                   1.6.15      2024-03-26 [3] RSPM (R 4.3.0)
#>  httr                     1.4.7       2023-08-15 [3] RSPM (R 4.2.0)
#>  interactiveDisplayBase   1.40.0      2023-10-24 [1] Bioconductor
#>  IRanges                * 2.36.0      2023-10-24 [1] Bioconductor
#>  KEGGREST                 1.42.0      2023-10-24 [1] Bioconductor
#>  knitr                    1.46        2024-04-06 [1] CRAN (R 4.3.3)
#>  later                    1.3.2       2023-12-06 [3] RSPM (R 4.3.0)
#>  lattice                  0.22-6      2024-03-20 [3] RSPM (R 4.3.0)
#>  lazyeval                 0.2.2       2019-03-15 [3] RSPM (R 4.2.0)
#>  lifecycle                1.0.4       2023-11-07 [3] RSPM (R 4.3.0)
#>  magrittr                 2.0.3       2022-03-30 [3] RSPM (R 4.2.0)
#>  Matrix                   1.6-5       2024-01-11 [3] RSPM (R 4.3.0)
#>  MatrixGenerics         * 1.14.0      2023-10-24 [1] Bioconductor
#>  matrixStats            * 1.2.0       2023-12-11 [3] RSPM (R 4.3.0)
#>  memoise                  2.0.1       2021-11-26 [3] RSPM (R 4.2.0)
#>  mime                     0.12        2021-09-28 [3] RSPM (R 4.2.0)
#>  pillar                   1.9.0       2023-03-22 [3] RSPM (R 4.2.0)
#>  pkgconfig                2.0.3       2019-09-22 [3] CRAN (R 4.0.1)
#>  png                      0.1-8       2022-11-29 [3] RSPM (R 4.2.0)
#>  prettyunits              1.2.0       2023-09-24 [3] RSPM (R 4.3.0)
#>  progress                 1.2.3       2023-12-06 [1] CRAN (R 4.3.2)
#>  promises                 1.3.0       2024-04-05 [1] CRAN (R 4.3.3)
#>  ProtGenerics             1.34.0      2023-10-24 [1] Bioconductor
#>  purrr                    1.0.2       2023-08-10 [1] CRAN (R 4.3.1)
#>  R.cache                  0.16.0      2022-07-21 [3] RSPM (R 4.2.0)
#>  R.methodsS3              1.8.2       2022-06-13 [3] RSPM (R 4.2.0)
#>  R.oo                     1.26.0      2024-01-24 [3] RSPM (R 4.3.0)
#>  R.utils                  2.12.3      2023-11-18 [3] RSPM (R 4.3.0)
#>  R6                       2.5.1       2021-08-19 [3] RSPM (R 4.2.0)
#>  rappdirs                 0.3.3       2021-01-31 [3] RSPM (R 4.2.0)
#>  Rcpp                     1.0.12      2024-01-09 [3] RSPM (R 4.3.0)
#>  RCurl                    1.98-1.14   2024-01-09 [3] RSPM (R 4.3.0)
#>  reprex                   2.1.0       2024-01-11 [3] RSPM (R 4.3.0)
#>  restfulr                 0.0.15      2022-06-16 [3] RSPM (R 4.2.0)
#>  rjson                    0.2.21      2022-01-09 [3] RSPM (R 4.2.0)
#>  rlang                    1.1.3       2024-01-10 [3] RSPM (R 4.3.0)
#>  rmarkdown                2.26        2024-03-05 [3] RSPM (R 4.3.0)
#>  Rsamtools                2.18.0      2023-10-24 [1] Bioconductor
#>  RSQLite                  2.3.6       2024-03-31 [1] CRAN (R 4.3.3)
#>  rtracklayer              1.62.0      2023-10-24 [1] Bioconductor
#>  S4Arrays                 1.2.1       2024-03-04 [1] Bioconductor 3.18 (R 4.3.3)
#>  S4Vectors              * 0.40.2      2023-11-23 [1] Bioconductor 3.18 (R 4.3.2)
#>  scRNAseq               * 2.16.0      2023-10-26 [1] Bioconductor
#>  sessioninfo              1.2.2       2021-12-06 [3] RSPM (R 4.2.0)
#>  shiny                    1.8.1.1     2024-04-02 [3] RSPM (R 4.3.0)
#>  SingleCellExperiment   * 1.24.0      2023-10-24 [1] Bioconductor
#>  SparseArray              1.2.4       2024-02-11 [1] Bioconductor 3.18 (R 4.3.3)
#>  stringi                  1.8.3       2023-12-11 [3] RSPM (R 4.3.0)
#>  stringr                  1.5.1       2023-11-14 [3] RSPM (R 4.3.0)
#>  styler                   1.10.3      2024-04-07 [1] CRAN (R 4.3.3)
#>  SummarizedExperiment   * 1.32.0      2023-10-24 [1] Bioconductor
#>  tibble                   3.2.1       2023-03-20 [3] RSPM (R 4.3.0)
#>  tidyselect               1.2.1       2024-03-11 [3] RSPM (R 4.3.0)
#>  utf8                     1.2.4       2023-10-22 [3] RSPM (R 4.3.0)
#>  vctrs                    0.6.5       2023-12-01 [3] RSPM (R 4.3.0)
#>  withr                    3.0.0       2024-01-16 [1] CRAN (R 4.3.2)
#>  xfun                     0.43        2024-03-25 [1] CRAN (R 4.3.3)
#>  XML                      3.99-0.16.1 2024-01-22 [3] RSPM (R 4.3.0)
#>  xml2                     1.3.6       2023-12-04 [3] RSPM (R 4.3.0)
#>  xtable                   1.8-4       2019-04-21 [1] CRAN (R 4.3.0)
#>  XVector                  0.42.0      2023-10-24 [1] Bioconductor
#>  yaml                     2.3.8       2023-12-11 [3] RSPM (R 4.3.0)
#>  zlibbioc                 1.48.2      2024-03-13 [1] Bioconductor 3.18 (R 4.3.3)
#> 
#>  [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

BioC 3.19

suppressPackageStartupMessages(library(scRNAseq))
sce <- GrunPancreasData()
# 'treatment' rather than 'sample'
colData(sce)
#> DataFrame with 1728 rows and 2 columns
#>                  donor              treatment
#>            <character>            <character>
#> D2ex_1              D2 exocrine fraction, l..
#> D2ex_2              D2 exocrine fraction, l..
#> D2ex_3              D2 exocrine fraction, l..
#> D2ex_4              D2 exocrine fraction, l..
#> D2ex_5              D2 exocrine fraction, l..
#> ...                ...                    ...
#> D17TGFB_92         D17   TGFBR3+ sorted cells
#> D17TGFB_93         D17   TGFBR3+ sorted cells
#> D17TGFB_94         D17   TGFBR3+ sorted cells
#> D17TGFB_95         D17   TGFBR3+ sorted cells
#> D17TGFB_96         D17   TGFBR3+ sorted cells

sce_legacy <- GrunPancreasData(legacy = TRUE)
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
colData(sce_legacy)
#> DataFrame with 1728 rows and 2 columns
#>                  donor                 sample
#>            <character>            <character>
#> D2ex_1              D2 exocrine fraction, l..
#> D2ex_2              D2 exocrine fraction, l..
#> D2ex_3              D2 exocrine fraction, l..
#> D2ex_4              D2 exocrine fraction, l..
#> D2ex_5              D2 exocrine fraction, l..
#> ...                ...                    ...
#> D17TGFB_92         D17   TGFBR3+ sorted cells
#> D17TGFB_93         D17   TGFBR3+ sorted cells
#> D17TGFB_94         D17   TGFBR3+ sorted cells
#> D17TGFB_95         D17   TGFBR3+ sorted cells
#> D17TGFB_96         D17   TGFBR3+ sorted cells
Session info
sessionInfo()
#> R version 4.4.0 alpha (2024-04-02 r86304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/peter/Downloads/R-alpha/lib/libRblas.so 
#> LAPACK: /home/peter/Downloads/R-alpha/lib/libRlapack.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
#>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
#>  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Australia/Melbourne
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] scRNAseq_2.17.7             SingleCellExperiment_1.25.1
#>  [3] SummarizedExperiment_1.33.3 Biobase_2.63.1             
#>  [5] GenomicRanges_1.55.4        GenomeInfoDb_1.39.13       
#>  [7] IRanges_2.37.1              S4Vectors_0.41.6           
#>  [9] BiocGenerics_0.49.1         MatrixGenerics_1.15.0      
#> [11] matrixStats_1.2.0          
#> 
#> loaded via a namespace (and not attached):
#>  [1] DBI_1.2.2                bitops_1.0-7             httr2_1.0.1             
#>  [4] rlang_1.1.3              magrittr_2.0.3           gypsum_0.99.15          
#>  [7] compiler_4.4.0           RSQLite_2.3.6            GenomicFeatures_1.55.4  
#> [10] png_0.1-8                vctrs_0.6.5              ProtGenerics_1.35.4     
#> [13] pkgconfig_2.0.3          crayon_1.5.2             fastmap_1.1.1           
#> [16] dbplyr_2.5.0             XVector_0.43.1           utf8_1.2.4              
#> [19] Rsamtools_2.19.4         rmarkdown_2.26           UCSC.utils_0.99.5       
#> [22] purrr_1.0.2              bit_4.0.5                xfun_0.43               
#> [25] reprex_2.1.0             aws.s3_0.3.21            zlibbioc_1.49.3         
#> [28] cachem_1.0.8             jsonlite_1.8.8           blob_1.2.4              
#> [31] rhdf5filters_1.15.4      DelayedArray_0.29.9      Rhdf5lib_1.25.3         
#> [34] BiocParallel_1.37.1      parallel_4.4.0           R6_2.5.1                
#> [37] rtracklayer_1.63.2       Rcpp_1.0.12              knitr_1.46              
#> [40] base64enc_0.1-3          Matrix_1.7-0             tidyselect_1.2.1        
#> [43] abind_1.4-5              yaml_2.3.8               codetools_0.2-20        
#> [46] curl_5.2.1               lattice_0.22-6           alabaster.sce_1.3.3     
#> [49] tibble_3.2.1             withr_3.0.0              KEGGREST_1.43.0         
#> [52] evaluate_0.23            BiocFileCache_2.11.2     alabaster.schemas_1.3.1 
#> [55] xml2_1.3.6               ExperimentHub_2.11.1     Biostrings_2.71.5       
#> [58] pillar_1.9.0             BiocManager_1.30.22      filelock_1.0.3          
#> [61] generics_0.1.3           RCurl_1.98-1.14          BiocVersion_3.19.1      
#> [64] ensembldb_2.27.1         alabaster.base_1.3.23    alabaster.ranges_1.3.3  
#> [67] glue_1.7.0               alabaster.matrix_1.3.13  lazyeval_0.2.2          
#> [70] tools_4.4.0              AnnotationHub_3.11.4     BiocIO_1.13.0           
#> [73] GenomicAlignments_1.39.5 fs_1.6.3                 XML_3.99-0.16.1         
#> [76] rhdf5_2.47.6             grid_4.4.0               AnnotationDbi_1.65.2    
#> [79] GenomeInfoDbData_1.2.12  HDF5Array_1.31.6         restfulr_0.0.15         
#> [82] cli_3.6.2                rappdirs_0.3.3           fansi_1.0.6             
#> [85] S4Arrays_1.3.7           dplyr_1.1.4              AnnotationFilter_1.27.0 
#> [88] alabaster.se_1.3.4       digest_0.6.35            SparseArray_1.3.5       
#> [91] rjson_0.2.21             memoise_2.0.1            htmltools_0.5.8.1       
#> [94] lifecycle_1.0.4          httr_1.4.7               mime_0.12               
#> [97] aws.signature_0.6.0      bit64_4.0.5

Oops. The GrunPancreasData issue was an error on my part; this should be fixed in the next scRNAseq version.

For the current BBS failure: this is less of my fault. Where possible, the new datasets were generated fresh from their primary sources (i.e., GEO, ArrayExpress). In this case, ArrayExpress decided to rename the individuals - in particular, AZ is now called H1 - causing this line to not filter out the offending individual. Change the filter condition from AZ to H1 and everything works again. (Okay, not completely back-compatible, but I'm just following ArrayExpress here.)

I just ran through all the workflows in OSCA.workflows that use scRNAseq, and the only other failure is in nestorowa-hsc.Rmd where I moved the FACS data from the colData to its own altExp - sure, a breaking change for anyone who was using the FACS data, but a more sensible long-term place for it to live. The corresponding patch here should be:

Y <- assay(altExp(sce.nest, "FACS"))
keep <- colSums(is.na(Y))==0 # Removing NA intensities.

se.averaged <- sumCountsAcrossCells(Y[,keep], 
    colLabels(sce.nest)[keep], average=TRUE)

Seems like we're almost there, at least for OSCA.workflows. I suppose we could just slap legacy=TRUE on the scRNAseq calls for any workflows that still don't work. A bit unsatisfying but oh well.

Closing as now addressed in OSCA.workflows.