Snakemake workflow: scrna-seq

A Snakemake workflow for single-cell RNA-seq analysis

🤪 Authors

Songqi Duan, https://songqi.online

📦 Analysis module

Difference analysis This module includes Differential Gene Analysis, GO and KEGG Pathway Enrichment, and GSEA. The Seurat package was used for differentially expressed gene analysis; the clusterProfiler package was used for functional enrichment and GSEA.
Score This module includes the GSVA algorithm to score functional terms under several categories of H, C2, C5, C6 and C7 in the MSigDB database. The GSVA package was used for scoring; the limma package was used for differential expression pathway analysis.
Transcription factor prediction This module uses pyscenic for transcription factor prediction, and uses the limma package for differential analysis of transcription factors between groups.

🕹️ Usage

1. Input data request

The input data is the rds file of the Seurat object. The meta.data of the Seurat object should include cell type cell_type and group information (for example: group, which includes two groups, such as Normal and Tumor).

2. Obtain a copy of this workflow

cd ~/
git clone https://github.com/zerostwo/scrna-seq.git

3. Configure workflow

Configure the workflow according to your needs via editing the file config.yaml.

#### Required content ----
# Absolute path to Seurat object
INPUT: /home/duansq/pipeline/scrna-seq/resources/test_data.rds
# Grouping information, ensure that the grouping field exists in the meta.data of the Seurat object, and only contains two groups
GROUP: METTL3_group
# Set the positive group you want to use for comparison
TREATMENT: positive
# Differentially expressed gene analysis method (optional: MAST, bimod, wilcox, LR, t)
TEST_METHOD: wilcox
# Set up the assay for analysis, usually RNA
ASSAY: RNA
# Set your species (optional: Homo sapiens or Mus musculus)
SPECIES: Homo sapiens
# Score method (optional: GSVA, AddModuleScore, AUCell)
SCORE_METHOD: AddModuleScore
#### Software settings ----
# pyscenic path
PYSCENIC_PATH: /opt/pySCENIC/0.11.2/bin/pyscenic
# The path of python where pyscenic is located
PYTHON_PATH: /opt/pySCENIC/0.11.2/venv/bin/python
# pyscenic annotation file. Download from https://pyscenic.readthedocs.io/en/latest/installation.html#auxiliary-datasets
ANNOTATIONS_FILE_PATH: /DATA/public/cisTarget_databases/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl 
# pyscenic database. Download from https://resources.aertslab.org/cistarget/
DATABASE_FILE_PATH: 
  /DATA/public/cisTarget_databases/human/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather
  /DATA/public/cisTarget_databases/human/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather
# A list of transcription factors. Download from https://github.com/aertslab/pySCENIC/tree/master/resources
TF_LIST: /DATA/public/cisTarget_databases/resources/hs_hgnc_tfs.txt
#### Optional ---- 
# Selected p-value and log2(fold change) threshold when performing differentially expressed gene analysis
P_VALUE: 0.05
LOG2FC: 0.25

4. Execute workflow

# 1. Switch to the pipeline path
cd ~/scrna-seq
# 2. Activate the snakemake environment
conda activate snakemake
# 3. Test your configuration by performing a dry-run via
snakemake -np
# 4. Execute the workflow locally via
snakemake --cores 10

📂 Result file description

After the program is fully run, the results are generated under the results folder. Five folders are usually generated under each program, the structure is as follows:

test_data
├── benchmark
├── deg
├── function
│   ├── GO
│   ├── GSEA
│   └── KEGG
├── logs
├── scenic
└── score

benchmark contains the CPU, memory and time consumed by each analysis script;
deg contains the result file of differentially expressed gene analysis;
function contains three subfolders, which are the results of GO and KEGG enrichment analysis and GSEA;
logs contains log files generated by each analysis script run;
scenic contains transcription factor prediction and between-group difference analysis results;
score contains a subfolder for scoring GSVA and the results of component variance analysis.

zerostwo / scrna-seq