On the discovery of population-specific state transitions
from multi-sample multi-condition scRNA-seq data
This repository contains all the necessary code to perform the evaluations and analyses from our preprint available on bioRxiv.
Analyses discussed in the Differential state analysis of mouse cortex exposed to LPS treatment
results section are provided as a browsable workflowr
1 website HERE.
In brief, our snakemake
workflow for method comparison is organized into
- a
config.yaml
file specify key parameters and directories - a
scripts
folder housing all utilized scripts (see below) - a
data
folder containing raw (reference) and simulated data - a
meta
folder for simulation, runmode, and method parameters - a
results
folder where all results are generated (as.rds
files) - a
figures
folder where all output plots are generated
(as.pdf
or.png
files, or.rds
files forggplot
objects)
The table below summarizes the different R scripts in scripts
:
script | description |
---|---|
prep_X |
generates a references SCE for simulation by i) keeping samples from one condition only; and, ii) unifying relevant cell metadata names to "cluster/sample/group_id" |
prep_sim |
prepares a reference SCE for simulation by i) retaining subpopulation-sample combinations with at least 100 cells; and, ii) estimating cell / gene parameters (offsets / coefficients and dispersions) |
sim_pars |
for ea. simulation ID, generates a .json file in meta/sim_pars that specifies simulation parameters (e.g., prob. of DS, nb. of simulation replicates) |
run_pars |
for ea. reference and simulation ID, generates a .json file in meta/run_pars that specifies runmode parameters (e.g., nb. of cells/genes to sample, nb. of run replicates) |
meth_pars |
for ea. method ID, generates a .json file in meta/meth_pars that specifies method parameters |
sim_data |
provided with a reference dataset and simulation parameters, simulates data and writes a SCE to data/sim_data |
apply_X |
wrapper to run DS method of type X (pb , mm , ad , mast , scdd ) |
run_meth |
reads in simulated data, method parameters, and performs DS analysis by running the corresponding apply_X script |
plot_null |
for ea. reference ID, plots nominal p-value distributions for all null simulations |
plot_tprfdr |
plots TPR-FDR-curves for a single result |
plot_perf_cat |
plots TPR-FDR-points across DD categories for ea. p-value adjustment type (p_adj.loc/glb ) |
plot_perf_by_nx |
plots TPR-FDR-points across the nb. of x (cells = c , samples = s ) |
plot_perf_by_ss |
plots TPR-FDR-points across increasingly unbalanced sample-sizes |
plot_perf_by_expr |
plots TPR-FDR-points across expression-level groups |
plot_upset |
plots an upset plot for the top gene-subpopulation combinations across methods and simulation replications |
plot_lfc |
scatter plots of simulated vs. estimated logFC stratified by method and DD category |
plot_pb_mean_disp |
provided with a reference dataset, simulates a null dataset (no DS, no type-genes) and plots pseudobulk-level mean-dispersion estimates for simulated vs. reference data |
plot_runtimes |
barplots of runtimes vs. nb. of genes/cells |
utils |
various helpers for data handling, formatting, and plotting |
session_info |
generates a .txt file capturing the output of session_info() |
[1]:
John Blischak, Peter Carbonetto and Matthew Stephens (2019).
workflowr: A Framework for Reproducible and Collaborative Data Science.
R package version 1.4.0. https://CRAN.R-project.org/package=workflowr