A Snakemake workflow for single-cell RNA-seq analysis
- Songqi Duan, https://songqi.online
-
Difference analysis This module includes
Differential Gene Analysis
,GO and KEGG Pathway Enrichment
, andGSEA
. TheSeurat
package was used for differentially expressed gene analysis; theclusterProfiler
package was used for functional enrichment and GSEA. -
Score This module includes the
GSVA
algorithm to score functional terms under several categories of H, C2, C5, C6 and C7 in the MSigDB database. TheGSVA
package was used for scoring; thelimma
package was used for differential expression pathway analysis. -
Transcription factor prediction This module uses
pyscenic
for transcription factor prediction, and uses thelimma
package for differential analysis of transcription factors between groups.
The input data is the rds
file of the Seurat object. The meta.data
of the Seurat object should include cell type cell_type
and group information (for example: group
, which includes two groups, such as Normal and Tumor).
cd ~/
git clone https://github.com/zerostwo/scrna-seq.git
Configure the workflow according to your needs via editing the file config.yaml
.
#### Required content ----
# Absolute path to Seurat object
INPUT: /home/duansq/pipeline/scrna-seq/resources/test_data.rds
# Grouping information, ensure that the grouping field exists in the meta.data of the Seurat object, and only contains two groups
GROUP: METTL3_group
# Set the positive group you want to use for comparison
TREATMENT: positive
# Differentially expressed gene analysis method (optional: MAST, bimod, wilcox, LR, t)
TEST_METHOD: wilcox
# Set up the assay for analysis, usually RNA
ASSAY: RNA
# Set your species (optional: Homo sapiens or Mus musculus)
SPECIES: Homo sapiens
# Score method (optional: GSVA, AddModuleScore, AUCell)
SCORE_METHOD: AddModuleScore
#### Software settings ----
# pyscenic path
PYSCENIC_PATH: /opt/pySCENIC/0.11.2/bin/pyscenic
# The path of python where pyscenic is located
PYTHON_PATH: /opt/pySCENIC/0.11.2/venv/bin/python
# pyscenic annotation file. Download from https://pyscenic.readthedocs.io/en/latest/installation.html#auxiliary-datasets
ANNOTATIONS_FILE_PATH: /DATA/public/cisTarget_databases/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl
# pyscenic database. Download from https://resources.aertslab.org/cistarget/
DATABASE_FILE_PATH:
/DATA/public/cisTarget_databases/human/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather
/DATA/public/cisTarget_databases/human/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather
# A list of transcription factors. Download from https://github.com/aertslab/pySCENIC/tree/master/resources
TF_LIST: /DATA/public/cisTarget_databases/resources/hs_hgnc_tfs.txt
#### Optional ----
# Selected p-value and log2(fold change) threshold when performing differentially expressed gene analysis
P_VALUE: 0.05
LOG2FC: 0.25
# 1. Switch to the pipeline path
cd ~/scrna-seq
# 2. Activate the snakemake environment
conda activate snakemake
# 3. Test your configuration by performing a dry-run via
snakemake -np
# 4. Execute the workflow locally via
snakemake --cores 10
After the program is fully run, the results are generated under the results
folder. Five folders are usually generated under each program, the structure is as follows:
test_data
├── benchmark
├── deg
├── function
│ ├── GO
│ ├── GSEA
│ └── KEGG
├── logs
├── scenic
└── score
benchmark
contains the CPU, memory and time consumed by each analysis script;deg
contains the result file of differentially expressed gene analysis;function
contains three subfolders, which are the results of GO and KEGG enrichment analysis and GSEA;logs
contains log files generated by each analysis script run;scenic
contains transcription factor prediction and between-group difference analysis results;score
contains a subfolder for scoring GSVA and the results of component variance analysis.