Published in Gruffi: an algorithm for computational removal of stressed cells from brain organoid transcriptomic datasets
The Gruffi R package helps you (1) to identify stressed cells in single-cell RNA-seq datasets using granular funcitonal filtering, or (2) you can use it to analyze any gene set's pathway activity (GO-term or custom defined), and define a cell population based on their combined activity.
Gruffi
integrates into any Seurat
analysis pipelione & it comes with a graphical user interface.
v1.5.*
is released, that fixes critical problems which arose with changes in dependencies and made the installation / pipeline break. Additionally it contains updates necessary to work with Seurat v5 objects.- Now it is more explicit that GO-annotation can either be obtained via
BioMart
orAnnotationDbi
, which is helpful, as BioMart connection is not always working. For the same reason, it now allows the usage of alternative mirrors for BioMart (thanks to@zuzkamat
). - Major consistency update in function names, see below.
- Consistency update in variable names (in @misc and @meta.data), for the latter, this update is not backward compatible, meaning earlier results needs manual adjustments in those names to run. Recommended is to rerun the new version.
- Major cleanup and of the codebase. Granule average scores are now stored as numeric rather than factor.
Added functionality
FindThresholdsAuto()
that is a shiny free versionFindThresholdsShiny()
, which allows an automated workflow without the need for graphical output (thanks to@heyfl
). We still strongly recommend the shiny based workflow that enforces users to look at the results and intermediate steps instead of blidnly taking a TRUE / FALSE column.- New visualization functions, incuding:
GrScoreUMAP(obj = combined.obj, colname = 'RNA_snn_res.6_cl.av_GO.0042063', miscname = 'thresh.stress.ident1') GrScoreHistogram(obj = combined.obj, colname = 'RNA_snn_res.6_cl.av_GO.0042063', miscname = 'thresh.stress.ident1')
- New helper functions that allow a simpler workflow and shorter codebase.
You can install dependencies from CRAN, Bioconductor and GitHub via devtools:
# Install CRAN & Bioconductor dependencies
install.packages('BiocManager')
BiocManager::install("DOSE")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("sparseMatrixStats")
BiocManager::install("biomaRt")
BiocManager::install("raster")
BiocManager::install("rgl")
# Install custom dependencies
install.packages('devtools')
devtools::install_github(repo = "vertesy/Stringendo", upgrade = F)
devtools::install_github(repo = "vertesy/CodeAndRoll2", upgrade = F)
devtools::install_github(repo = "vertesy/ReadWriter", upgrade = F)
devtools::install_github(repo = "vertesy/MarkdownHelpers", upgrade = F)
devtools::install_github(repo = "vertesy/Markdownreports", upgrade = F)
devtools::install_github(repo = "vertesy/ggExpress", upgrade = F)
devtools::install_github(repo = "vertesy/Seurat.utils", upgrade = F)
# Install gruffi
devtools::install_github(repo = "jn-goe/gruffi", upgrade = F)
NOTE: If you type 'all' when R asks to update dependencies, you may get into installation errors / infinite loops. If updating fails, type 'no' when prompted.
Load your Seurat single-cell object of interest (in the following combined.obj
), perform basic analysis steps (Integration, PCA, Clustering) and calculate 2D and 3D umaps.
library(gruffi)
combined.obj <- Seurat.utils::SetupReductionsNtoKdimensions(obj = combined.obj, nPCs = 50, dimensions=3:2, reduction="umap")
# Note that this will recalculate 2D and 3D umaps, and back them up in combined.obj@misc$reductions.backup.
# See FAQ.md, if you don't want to overwrite the 2D UMAP.
If you want to store multiple UMAP's, you have to keep them in backup slot (in combined.obj@misc). Use combined.obj <- RecallReduction(obj = combined.obj, dim = 2, reduction="umap")
to access them.
Prepare GO-terms and gene-sets
go1 <- "GO:0006096" # Glycolysis
go2 <- "GO:0034976" # ER-stress
go3 <- "GO:0042063" # Gliogenesis, negative filtering
Gruffi works best if you partition cells into groups of 100-200 cells. Find the corresponding clustering resolution by:
combined.obj <- AutoFindGranuleResolution(obj = combined.obj)
The optimal resolution found is stored in:
(granule.res.4.gruffi <- GetGruffiClusteringName(combined.obj)) # Recalled from @misc$gruffi$'optimal.granule.res.
Some granules have too few cells, therfore their scoring is not robust statistically. Use ReassignSmallClusters
to assign these cells to the nearest large-enough granule:
combined.obj <- ReassignSmallClusters(combined.obj, ident = granule.res.4.gruffi) # Will be stored in meta data column as "seurat_clusters.reassigned".
Above, granules with <30 cells are cell-by-cell re-assigned to a neighboring granule (by default based on Euclidean distance between the mean of cell groups in 3dim UMAP space). The reassigned granules are suffixed as :
(granule.res.4.gruffi <- GetGruffiClusteringName(combined.obj))
After finding the right granule resolution, first GO scores per cells, then average GO-scores per granule (normalized for respective cell numbers) are computed. Within the GO score computation an ensembl database will be loaded (that you created above as ensembl
).
# Glycolytic process GO:0006096
combined.obj <- AssignGranuleAverageScoresFromGOterm(obj = combined.obj, GO_term = go1, save.UMAP = TRUE, new_GO_term_computation = T, clustering = granule.res.4.gruffi, plot.each.gene = F)
# ER stress GO:0034976
combined.obj <- AssignGranuleAverageScoresFromGOterm(obj = combined.obj, GO_term = go2, save.UMAP = TRUE, new_GO_term_computation = T, clustering = granule.res.4.gruffi, plot.each.gene = F)
# Gliogenesis GO:0042063
combined.obj <- AssignGranuleAverageScoresFromGOterm(obj = combined.obj, GO_term = go3, save.UMAP = TRUE, new_GO_term_computation = T, clustering = granule.res.4.gruffi, plot.each.gene = F)
These functions store the resulting scores in combined.obj@meta.data
.
We will now call a Shiny Interface to auto-estimate and/or manually adjust the stress assignment threeholds via FindThresholdsShiny()
.
- The first 2 scores are for features to filter out (e.g.: cells with high ER stress and glycolysis),
- and the last 2 scores are for features we want to keep (e.g.: cells with high gliogenesis).
Example code for filtering cells high in glycolytic process and ER stress but low in gliogenesis:
# Create score names:
(i1 <- ParseGruffiGranuleScoreName(goID = go1))
(i2 <- ParseGruffiGranuleScoreName(goID = go2))
(i3 <- ParseGruffiGranuleScoreName(goID = go3))
# Call Shiny app
combined.obj <- FindThresholdsShiny(obj = combined.obj,
stress.ident1 = i1,
stress.ident2 = i2,
notstress.ident3 = i3)
"Dont forget to click the button in the app: Save New Thresholds"
The interface allows automatic estimation and manual adjustment of thresholds for stress scores:
After pushing the Save new thresholds button in the Shiny graphical user interface, thresholds are saved in combined.obj@misc$gruffi
and the stress assignment is stored as a new @meta.data
column, is.Stressed
. Check results with:
StressUMAP(combined.obj)
StressBarplotPerCluster(combined.obj, group.by = GetClusteringRuns()[1])
GrScoreHistogram(combined.obj, colname = i1, miscname = "thresh.stress.ident1")
# and more
cellIDs.keep <- which_names(!combined.obj$'is.Stressed')
subset.obj <- subset(x = combined.obj, cells = cellIDs.keep)
StressUMAP(subset.obj, title = "No Stressed Cells Remain After Gruffi", suffix = "after.cleanup")
For Errors, Problems & Questions: please see the FAQ or raise an issue in the github repository. We try to answer.