Single-cell sequencing of peripheral blood mononuclear cells in acute Kawasaki disease

System requirements

The source code was written in R and can be run on any Linux and Windows systems that support R. It was tested with R version 4.0.1. R packages dependencies include: Seurat (version 3.0.2), SingleR (version: 1.0.6), DESeq2 (version: 1.28.1), clusterProfiler (version: 3.16.0), immunarch (version: 0.6.5).

Installation guide

Please find the installation guide of R and R packages in their home page.

R: https://cran.r-project.org/
Seurat: https://satijalab.org/seurat/
SingleR: https://bioconductor.org/packages/release/bioc/html/SingleR.html
DESeq2: https://bioconductor.org/packages/release/bioc/html/DESeq2.html
clusterProfiler: https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
immunarch: https://immunarch.com/

The installation is usually straightforward following the guide of the packages and Bioconductor.

Demo

Seurat analysis includes quality control, normalization, sample aggregating, dimension reduction, clustering and visualization. The expected outputs include cell clusters, gene expression matrix and various plots for visualization (UMAP, marker gene expression). You can find the demo of standard workflow in the home page of Seurat: https://satijalab.org/seurat/vignettes.html
SingleR performs cell type annotation from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types. You can follow the demo of Bioconductor: https://bioconductor.org/packages/release/bioc/vignettes/SingleR/inst/doc/SingleR.html
DESeq2 can perform differential expression analysis based on the pseudo-bulk expression profiles aggregated for each cell type. It will output the DEG list and related statistics such as FDR. You can follow the demo of Bioconductor: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
clusterProfiler performs functional enrichment analyses including over-representation of GO and KEGG, as well as GSEA. It will output the enriched functions and related statistics. You can follow the demo here: http://yulab-smu.top/clusterProfiler-book/
immunarch performs clonotype analyses of BCRs and TCRs, including clonotype diversity, clonotype abundance, clonotype tracking across samples and gene usages. You can find the demo in the home page of immunarch: https://immunarch.com/

Instructions for use

Note: the input and output files are specified in the scripts and should be carefully checked for reuse.

`batch1_integrate.R`

This script is used to integrate samples collected in Batch 1. To run the script on the real data, first prepare the cellranger output files for each sample. For scRNA-seq data, three files will be generated by cellranger count (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz). Put these files in a directory with corresponding sample name.

Input

samples vector of sample/directory names of input data
low_cutoff high_cutoff vector of UMI cutoffs for QC

Output

umap_0.1.pdf umap_1.2.pdf UMAP plot of cell clusters with different resolutions
markers.pdf visualization of cell markers across cell clusters
batch1_cellcount.txt cell counts across cell clusters
batch1_UMIcount.txt UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)
batch1.Rdata R data file containing the integrated data object (comb_data)

`batch2_transfer.R`

This script transfers cell labels from Batch 1 to Batch 2 samples. The input and output files are similar to batch1_integrate.R.

Input

samples vector of sample/directory names of input data
low_cutoff high_cutoff vector of UMI cutoffs for QC

Output

batch2_cellcount.txt cell counts across cell clusters
batch2_UMIcount.txt UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)
batch2.Rdata R data file containing the merged data object (merge_data)

`cell_annotation.R`

This script perform cell type annotation for Seurat clusters using SingleR. Five reference datasets (human primary cell atlas, Blueprint/ENCODE, Database of Immune Cell Expression, Novershtern hematopoietic data and Monaco immune data) are used for annotation, respectively.

Input

batch1.Rdata R data file containing the integrated data object (comb_data)

Output

SingleR.ref.pdf visualization of annotated clusters with each reference dataset

`cell_abundance.R`

This script summarizes cell percentage by types for visualization and statistical test.

Input

batch1_cellcount.txt batch2_cellcount.txt cell counts across cell clusters for each sample
cell_anno.txt manually refined cell type annotations for the cell clusters

Output

celltype.abundance.pdf boxplot of cell type percentage across conditions
p.value P-values of statistical test to compare the percentage across conditions

`DEG_analysis.R`

This script performs differential expression analysis for each cell compartment with DESeq2 based on the pseudo-bulk expression profile.

Input

batch1_UMIcount.txt batch2_UMIcount.txt UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)
cluster vectors of cell clusters for DEG analysis
sample1 vector of sample names for comparison between pre- and post-treatment
meta1 list of meta information for sample1, including patient and treatment
sample2 vector of sample names for comparison between pre-treatment and healthy controls
meta2 list of meta information for sample2

Output

CPM1.txt normalized pseudo-bulk expression profile with CPM (count per million mapped reads) between pre- and post-treatment. This file is formated as the GSEA input.
stat1.txt result table of statistical test for each gene, including P-value and FDR between pre- and post-treatment
CPM2.txt normalized pseudo-bulk expression profile with CPM between pre-treatment and healthy controls. This file is formated as the GSEA input.
stat2.txt result table of statistical test for each gene, including P-value and FDR between pre-treatment and healthy controls

`function_enrich.R`

This script performs GO, KEGG and MSigDB hallmark gene sets enrichment analysis with clusterProfiler.

Input

stat1.txt stat2.txt result tables of statistical test for each gene with DESeq2
genecut FDR cutoff for DEGs
enrichcut FDR cutoff for enrichment analysis

Output

ego_MF.txt result table of enrichment analysis for GO MF
ego_BP.txt result table of enrichment analysis for GO BP
ekegg.txt result table of enrichment analysis for KEGG
egmt.txt result table of enrichment analysis for MSigDB hallmark gene sets

`clonotype_filter.R`

This script is used to filter clonotypes of IGs for each sample. Only clonotypes with productive and paired IG chains (IGH and IGL/IGK) will be preserved. Please note clonotypes of TCRs can also be analyzed with the script by changing IGH and IGL to TRA and TRB. This script requires two input files created by cellranger vdj: consensus_annotations.csv and filtered_contig_annotations.csv.

Input

consensus_annotations.csv annotations of consensus clonotype sequences
filtered_contig_annotations.csv annotations of contig sequences for each cell

Output

filtered_contig_annotations_productive_pair.csv filtered clonotype annotations

`Clonotype_compare.R`

This script compares the clonotypes between two paired samples with immunarch.

Input

immdata load the directory containing two paired samples (i.e., two filtered_contig_annotations_productive_pair.csv files)

Output

repDiversity calculate the Gini coefficient of the two samples
trackClonotypes visualize clonotype tracking between the two samples
clone_select.csv abundance change of individual clonotypes between the two samples with the fisher's test

master-XS / scKD

Single-cell sequencing of peripheral blood mononuclear cells in acute Kawasaki disease

System requirements

Installation guide

Demo

Instructions for use

`batch1_integrate.R`

Input

Output

`batch2_transfer.R`

Input

Output

`cell_annotation.R`

Input

Output

`cell_abundance.R`

Input

Output

`DEG_analysis.R`

Input

Output

`function_enrich.R`

Input

Output

`clonotype_filter.R`

Input

Output

`Clonotype_compare.R`

Input

Output

About

Languages