FlowSets

Visualizing (differential) expression patterns with fuzzy concepts as FlowSets

Contact

FlowSets will be presented at ISMB/ECCB 2023 in the BioVis-Track!

Check-out the expression example here or the double-differential analysis here.

Examples

Extracting data from Seurat objects

We provide a collection of useful R-functions for extracting expression and differential data from Seurat objects. To access these methods you need to

source("https://raw.githubusercontent.com/mjoppich/FlowSets/main/seurat_util_functions.R")

Gene expression data

We first need to calculate gene expression data, and group them by TimePoint

df.all = getExtendedExpressionData(obj, assay="RNA", group.by="TimePoint")

It is then possible to write the data frame to disk

write.table(df.all, "expression_all.tsv", quote=F, sep="\t", row.names=F)

The data frame can then be read in python and used for analysis. The example analysis is available here.

Differential data

Similar to the expression data case, we first need to prepare the differential expression data for each TimePoint

celltype = "Monocytes-Immune-system"
print(celltype)

cells.celltype = cellIDForClusters(obj, "cellnamesread", c(celltype))
ignoreAnalysis = FALSE
tpDeList = list()
for (timep in c("1", "2", "3"))
{
    
    print(timep)
    cells.timepoint = cellIDForClusters(obj, "TimePoint", c(timep))
    
    cells.comp.sympt = intersect(cells.sympt, intersect(cells.timepoint, cells.celltype))
    cells.comp.asympt = intersect(cells.asympt, intersect(cells.timepoint, cells.celltype))
    
    print(paste(length(cells.comp.sympt), length(cells.comp.asympt)))
    
    if ((length(cells.comp.sympt) < 3) || (length(cells.comp.asympt) < 3))
    {
    ignoreAnalysis = TRUE
    next()
    }
    
    deResult = compareClusters(scdata=obj,
                                    cellsID1=cells.comp.sympt,
                                    cellsID2=cells.comp.asympt,
                                    prefix= paste("cluster", celltype, timep, sep="_"),
                                    suffix1="cells_sympt",
                                    suffix2="cells_asympt",
                                    test="t", fcCutoff=0.25, assay="RNA", outfolder=paste("./de_comparison_", celltype, sep=""))
    
    
    tpDeList[[timep]] = deResult
}

if (ignoreAnalysis)
{
    print(paste("Skipping", celltype))
    next()
}

makeCombinedDF(tpDeList, paste("./de_comparison_", celltype, sep=""))

The combined dataframe is then ready for usage in the FlowSets framework. The example analysis is available here.

Brief Method description

(Differential) Expression data are read in for each gene and each cluster (or: state). The values are fuzzified either by user-defined membership classes, or equally distributed over the measurement range (min-max), or according to predefined quantiles.

Relevant flows can be defined using a simple grammar with the flow_finder function, where the desired difference between two levels can be specified.

For each flow, or a group of flows, gene set enrichment analysis can be performed. Here, the gene sets are binned according to their size. E.g. all gene sets with at least 2 and at most 5 genes are put together into one bin. For each bin, all flow memberships are calculated. For each membership a z-score is calculated (how different is a geneset from all other gene sets of that bin), which is transformed into a p-value for all positive-z-score (=more than expected) gene sets.

A more detailed description is available in the working copy of our manuscript article.

mjoppich / FlowSets