aljabadi / muscat-comparison

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

On the discovery of population-specific state transitions
from multi-sample multi-condition scRNA-seq data

This repository contains all the necessary code to perform the evaluations and analyses from our preprint available on bioRxiv.

LPS dataset analysis

Analyses discussed in the Differential state analysis of mouse cortex exposed to LPS treatment
results section are provided as a browsable workflowr1 website HERE.

Comparison of DS analysis methods

In brief, our snakemake workflow for method comparison is organized into

  • a config.yaml file specify key parameters and directories
  • a scripts folder housing all utilized scripts (see below)
  • a data folder containing raw (reference) and simulated data
  • a meta folder for simulation, runmode, and method parameters
  • a results folder where all results are generated (as .rds files)
  • a figures folder where all output plots are generated
    (as .pdf or .png files, or .rds files for ggplot objects)

The table below summarizes the different R scripts in scripts:

script description
prep_X generates a references SCE for simulation by
i) keeping samples from one condition only; and,
ii) unifying relevant cell metadata names to "cluster/sample/group_id"
prep_sim prepares a reference SCE for simulation by
i) retaining subpopulation-sample combinations with at least 100 cells; and,
ii) estimating cell / gene parameters (offsets / coefficients and dispersions)
sim_pars for ea. simulation ID, generates a .json file in meta/sim_pars
that specifies simulation parameters (e.g., prob. of DS, nb. of simulation replicates)
run_pars for ea. reference and simulation ID, generates a .json file in meta/run_pars
that specifies runmode parameters (e.g., nb. of cells/genes to sample, nb. of run replicates)
meth_pars for ea. method ID, generates a .json file in meta/meth_pars
that specifies method parameters
sim_data provided with a reference dataset and simulation parameters,
simulates data and writes a SCE to data/sim_data
apply_X wrapper to run DS method of type X (pb, mm, ad, mast, scdd)
run_meth reads in simulated data, method parameters, and performs DS analysis
by running the corresponding apply_X script
plot_null for ea. reference ID, plots nominal p-value distributions for all null simulations
plot_tprfdr plots TPR-FDR-curves for a single result
plot_perf_cat plots TPR-FDR-points across DD categories for ea. p-value adjustment type (p_adj.loc/glb)
plot_perf_by_nx plots TPR-FDR-points across the nb. of x (cells = c, samples = s)
plot_perf_by_ss plots TPR-FDR-points across increasingly unbalanced sample-sizes
plot_perf_by_expr plots TPR-FDR-points across expression-level groups
plot_upset plots an upset plot for the top gene-subpopulation combinations across methods and simulation replications
plot_lfc scatter plots of simulated vs. estimated logFC stratified by method and DD category
plot_pb_mean_disp provided with a reference dataset, simulates a null dataset (no DS, no type-genes)
and plots pseudobulk-level mean-dispersion estimates for simulated vs. reference data
plot_runtimes barplots of runtimes vs. nb. of genes/cells
utils various helpers for data handling, formatting, and plotting
session_info generates a .txt file capturing the output of session_info()

References

[1]: John Blischak, Peter Carbonetto and Matthew Stephens (2019).
workflowr: A Framework for Reproducible and Collaborative Data Science.
R package version 1.4.0. https://CRAN.R-project.org/package=workflowr

About


Languages

Language:HTML 69.3%Language:R 14.1%Language:CSS 7.6%Language:JavaScript 6.7%Language:Python 1.4%Language:TeX 0.9%