plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data

Home Page:https://plger.github.io/scDblFinder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SCTransform example in vignette not working

JesseRop opened this issue · comments

Describe the bug
I'm trying to replicate the example in the vignette to use SCT internally within scdblFinder but I get the error below

Error in if (length(sel_features) > nfeatures) sel_features <- selFeatures(sce[sel_features, :
the condition has length > 1

MRE -- Minimal example to reproduce the bug

nfeatures <- 1000
sce <- mockDoubletSCE(dbl.rate=0.1, ngenes=2000 )
# sctransform on real cells:
vst1 <- sctransform::vst(counts(sce), n_cells=min(ncol(sce),5000), verbosity=0)
sce <- sce[row.names(vst1$y),]
logcounts(sce) <- vst1$y
hvg <- row.names(sce)[head(order(vst1$gene_attr$residual_variance, decreasing=TRUE), nfeatures)]

# define a processing function that scDblFinder will use on the real+artificial doublets;
# the input should be a count matrix and the number of dimensions, and the output a PCA matrix
  
myfun <- function(e, dims){
  # we use the thetas calculated from the first vst on real cells
  e <- e[intersect(row.names(e), row.names(vst1$model_pars_fit[which(!is.na(vst1$model_pars_fit[,"theta"])),])),]
  vst2 <- sctransform::vst(e, n_cells=min(ncol(e),5000), method="nb_theta_given", 
                           theta_given=vst1$model_pars_fit[row.names(e),"theta"],
                           min_cells=1L, verbosity=0)
  scater::calculatePCA(vst2$y, ncomponents=dims)
}
  
sce <- scDblFinder(sce, processing=myfun, nfeatures=hvg)

Traceback
1: scDblFinder(sce, processing = myfun, nfeatures = hvg)

Session info

R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 14.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] reticulate_1.31 caret_6.0-94 ggsankey_0.0.99999 scmap_1.20.2 ggh4x_0.2.4 MetBrewer_0.2.0
[7] SoupX_1.6.2 conflicted_1.2.0 ggvenn_0.1.9 Platypus_3.4.1 RColorBrewer_1.1-3 rmarkdown_2.24
[13] Hmisc_4.7-2 Formula_1.2-4 survival_3.5-0 lattice_0.20-45 plotly_4.10.2 SeuratDisk_0.0.0.9020
[19] SeuratData_0.2.2 scDblFinder_1.12.0 SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2
[25] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_1.0.0
[31] cowplot_1.1.1 tidyr_1.3.0 tibble_3.2.1 readr_2.1.4 purrr_1.0.2 stringr_1.5.0
[37] dplyr_1.1.2 ggpubr_0.5.0 ggrepel_0.9.3 ggplot2_3.4.3 DoubletFinder_2.0.3 patchwork_1.1.3
[43] rgl_1.0.1 sctransform_0.3.5 SeuratObject_4.1.3 Seurat_4.3.0.1

loaded via a namespace (and not attached):
[1] rsvd_1.0.5 ica_1.0-3 class_7.3-21 Rsamtools_2.14.0 foreach_1.5.2 lmtest_0.9-40
[7] crayon_1.5.2 MASS_7.3-58.2 nlme_3.1-161 backports_1.4.1 rlang_1.1.1 XVector_0.38.0
[13] ROCR_1.0-11 readxl_1.4.2 irlba_2.3.5.1 limma_3.54.2 scater_1.26.1 xgboost_1.7.3.1
[19] BiocParallel_1.32.5 rjson_0.2.21 bit64_4.0.5 glue_1.6.2 pheatmap_1.0.12 parallel_4.2.2
[25] vipor_0.4.5 spatstat.sparse_3.0-2 dotCall64_1.0-2 spatstat.geom_3.2-4 tidyselect_1.2.0 fitdistrplus_1.1-11
[31] XML_3.99-0.13 zoo_1.8-12 GenomicAlignments_1.34.0 xtable_1.8-4 magrittr_2.0.3 evaluate_0.21
[37] scuttle_1.8.4 cli_3.6.1 zlibbioc_1.44.0 rstudioapi_0.14 miniUI_0.1.1.1 sp_2.0-0
[43] rpart_4.1.19 treeio_1.22.0 maps_3.4.1 fields_14.1 shiny_1.7.5 BiocSingular_1.14.0
[49] xfun_0.40 clue_0.3-63 cluster_2.1.4 ape_5.6-2 listenv_0.9.0 Biostrings_2.66.0
[55] png_0.1-8 reshape_0.8.9 future_1.33.0 ipred_0.9-14 withr_2.5.0 bitops_1.0-7
[61] ggforce_0.4.1 plyr_1.8.8 cellranger_1.1.0 hardhat_1.3.1 pROC_1.18.5 e1071_1.7-12
[67] dqrng_0.3.0 pillar_1.9.0 GlobalOptions_0.1.2 cachem_1.0.8 hdf5r_1.3.8 GetoptLong_1.0.5
[73] googleVis_0.7.1 DelayedMatrixStats_1.20.0 vctrs_0.6.3 ellipsis_0.3.2 generics_0.1.3 lava_1.8.0
[79] tools_4.2.2 foreign_0.8-84 beeswarm_0.4.0 munsell_0.5.0 tweenr_2.0.2 proxy_0.4-27
[85] DelayedArray_0.24.0 fastmap_1.1.1 compiler_4.2.2 abind_1.4-5 httpuv_1.6.11 rtracklayer_1.58.0
[91] prodlim_2023.08.28 GenomeInfoDbData_1.2.9 gridExtra_2.3 edgeR_3.40.2 deldir_1.0-9 utf8_1.2.3
[97] later_1.3.1 recipes_1.0.10 jsonlite_1.8.7 GGally_2.1.2 scales_1.2.1 ScaledMatrix_1.6.0
[103] tidytree_0.4.2 pbapply_1.7-2 carData_3.0-5 sparseMatrixStats_1.10.0 lazyeval_0.2.2 promises_1.2.1
[109] car_3.1-1 doParallel_1.0.17 R.utils_2.12.2 latticeExtra_0.6-30 goftest_1.2-3 spatstat.utils_3.0-3
[115] checkmate_2.1.0 statmod_1.5.0 Rtsne_0.16 uwot_0.1.16 igraph_1.5.1 yaml_2.3.7
[121] htmltools_0.5.6 memoise_2.0.1 BiocIO_1.8.0 locfit_1.5-9.7 viridisLite_0.4.2 digest_0.6.33
[127] mime_0.12 rappdirs_0.3.3 spam_2.9-1 yulab.utils_0.0.6 future.apply_1.11.0 data.table_1.14.8
[133] R.oo_1.25.0 labeling_0.4.2 splines_4.2.2 RCurl_1.98-1.12 broom_1.0.3 hms_1.1.2
[139] colorspace_2.1-0 base64enc_0.1-3 BiocManager_1.30.22 ggbeeswarm_0.7.1 shape_1.4.6 aplot_0.1.9
[145] nnet_7.3-18 Rcpp_1.0.11 RANN_2.6.1 circlize_0.4.15 fansi_1.0.4 tzdb_0.4.0
[151] ModelMetrics_1.2.2.2 parallelly_1.36.0 R6_2.5.1 ggridges_0.5.4 lifecycle_1.0.3 bluster_1.8.0
[157] ggsignif_0.6.4 leiden_0.4.3 Matrix_1.5-3 RcppAnnoy_0.0.21 iterators_1.0.14 spatstat.explore_3.2-1
[163] gower_1.0.1 htmlwidgets_1.6.2 beachmat_2.14.0 polyclip_1.10-4 timechange_0.2.0 gridGraphics_0.5-1
[169] ComplexHeatmap_2.14.0 globals_0.16.2 htmlTable_2.4.1 spatstat.random_3.1-5 progressr_0.14.0 lubridate_1.9.2
[175] codetools_0.2-18 metapod_1.6.0 randomForest_4.7-1.1 R.methodsS3_1.8.2 gtable_0.3.3 DBI_1.1.3
[181] ggfun_0.0.9 tensor_1.5 httr_1.4.7 KernSmooth_2.23-20 stringi_1.7.12 reshape2_1.4.4
[187] farver_2.1.1 viridis_0.6.2 timeDate_4032.109 ggtree_3.6.2 BiocNeighbors_1.16.0 restfulr_0.0.15
[193] interp_1.1-3 ggplotify_0.1.0 scattermore_1.2 scran_1.26.2 bit_4.0.5 jpeg_0.1-10

Hi,

Thanks for the nicely reported issue.

You're using scDblFinder 1.12.0, the current release one is 1.16.0, and the ability to specify which genes to use was added in 1.13.7, which is why you're getting an error. It should work if you update; alternatively you can use the following workaround:

# same as your example up until running scDblFinder, then
# simply subset to the hvg instead of manually specifying them:
sce2 <- scDblFinder(sce[hvg,], processing=myfun, nfeatures=nfeatures)
# then put the results back in the original object:
sce$scDblFinder.score <- sce2$scDblFinder.score
sce$scDblFinder.class <- sce2$scDblFinder.class

Pierre-Luc