FlowSets
Visualizing (differential) expression patterns with fuzzy concepts as FlowSets
Contact
FlowSets will be presented at ISMB/ECCB 2023 in the BioVis-Track!
Check-out the expression example here or the double-differential analysis here.
Examples
Extracting data from Seurat objects
We provide a collection of useful R-functions for extracting expression and differential data from Seurat objects. To access these methods you need to
source("https://raw.githubusercontent.com/mjoppich/FlowSets/main/seurat_util_functions.R")
Gene expression data
We first need to calculate gene expression data, and group them by TimePoint
df.all = getExtendedExpressionData(obj, assay="RNA", group.by="TimePoint")
It is then possible to write the data frame to disk
write.table(df.all, "expression_all.tsv", quote=F, sep="\t", row.names=F)
The data frame can then be read in python and used for analysis. The example analysis is available here.
Differential data
Similar to the expression data case, we first need to prepare the differential expression data for each TimePoint
celltype = "Monocytes-Immune-system"
print(celltype)
cells.celltype = cellIDForClusters(obj, "cellnamesread", c(celltype))
ignoreAnalysis = FALSE
tpDeList = list()
for (timep in c("1", "2", "3"))
{
print(timep)
cells.timepoint = cellIDForClusters(obj, "TimePoint", c(timep))
cells.comp.sympt = intersect(cells.sympt, intersect(cells.timepoint, cells.celltype))
cells.comp.asympt = intersect(cells.asympt, intersect(cells.timepoint, cells.celltype))
print(paste(length(cells.comp.sympt), length(cells.comp.asympt)))
if ((length(cells.comp.sympt) < 3) || (length(cells.comp.asympt) < 3))
{
ignoreAnalysis = TRUE
next()
}
deResult = compareClusters(scdata=obj,
cellsID1=cells.comp.sympt,
cellsID2=cells.comp.asympt,
prefix= paste("cluster", celltype, timep, sep="_"),
suffix1="cells_sympt",
suffix2="cells_asympt",
test="t", fcCutoff=0.25, assay="RNA", outfolder=paste("./de_comparison_", celltype, sep=""))
tpDeList[[timep]] = deResult
}
if (ignoreAnalysis)
{
print(paste("Skipping", celltype))
next()
}
makeCombinedDF(tpDeList, paste("./de_comparison_", celltype, sep=""))
The combined dataframe is then ready for usage in the FlowSets framework. The example analysis is available here.
Brief Method description
(Differential) Expression data are read in for each gene and each cluster (or: state). The values are fuzzified either by user-defined membership classes, or equally distributed over the measurement range (min-max), or according to predefined quantiles.
Relevant flows can be defined using a simple grammar with the flow_finder function, where the desired difference between two levels can be specified.
For each flow, or a group of flows, gene set enrichment analysis can be performed. Here, the gene sets are binned according to their size. E.g. all gene sets with at least 2 and at most 5 genes are put together into one bin. For each bin, all flow memberships are calculated. For each membership a z-score is calculated (how different is a geneset from all other gene sets of that bin), which is transformed into a p-value for all positive-z-score (=more than expected) gene sets.
A more detailed description is available in the working copy of our manuscript article.