nsteinau / MAESTRO

Single-cell Transcriptome and Regulome Analysis Pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MAESTRO

GitHub GitHub release (latest by date) Conda Docker Pulls

MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis. Currently, MAESTRO support Smart-seq2, 10x-genomics, Drop-seq, SPLiT-seq for scRNA-seq protocols; microfudics-based, 10x-genomics and sci-ATAC-seq for scATAC-seq protocols.

ChangeLog

v1.0.0

  • Release MAESTRO.

v1.0.1

  • Provide docker image for easy installation. Note, the docker does not include cellranger/cellranger ATAC, as well as the corresponding genome index. Please install cellranger/cellranger ATAC following the installation instructions.

v1.0.2

  • Fix some bugs and set LISA as the default method to predict transcription factors for scRNA-seq. Note, the docker includes the lisa conda environment, but does not include required pre-computed genome datasets. Please download hg38 or mm10 datasets and update the configuration following the installation instructions.

v1.1.0

  • Change the default alignment method of MAESTRO from cellranger to starsolo and minimap2 for accerating the mapping time.
  • Improve the memory efficiency of scATAC gene score calculation.
  • Incoparate the installation of giggle into MAESTRO, add web API for LISA function, all the core MAESTRO function can be installed through the conda environment now!
  • Provide more documents for the QC parameters and add flexibility for other parameters in the workflow.

System requirements

  • Linux/Unix
  • Python (>= 3.0) for MAESTRO snakemake workflow
  • R (>= 3.5.1) for MAESTRO R package

Installation

Installing the full solution of MAESTRO workflow

MAESTRO uses the Miniconda3 package management system to harmonize all of the software packages. Users can install full solution of MAESTRO using the conda environment.

Use the following commands to install Minicoda3:

$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh

And then users can create an isolated environment for MAESTRO and install through the following commands:

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda create -n MAESTRO maestro -c liulab-dfci

Installing the MAESTRO R package

If users already have the processed datasets, like cell by gene or cell by peak matrix generate by Cell Ranger. Users can install the MAESTRO R package alone to perform the analysis from processed datasets.

$ R
> library(devtools)
> install_github("liulab-dfci/MAESTRO")

Requried annotations for MAESTRO workflow

  • MAESTRO depends on starsolo and minimap2 for mapping scRNA-seq and scATAC-seq dataset. Users need to generate the reference files for the alignment software and specify the path of the annotations to MAESTRO through command line options.

  • MAESTRO utilizes LISA to evaluate the enrichment of transcription factors based on the marker genes from scRNA-seq clusters. MAESTRO provided two options for LISA function, the web version do not need installation of LISA and download the annotations. If users select the local version, which is faster than the web version, users need to install LISA locally, build the annotation files according to the LISA document, and provide the path of LISA to MAESTRO when using the RNAAnnotateTranscriptionFactor function.

  • MAESTRO utilizes giggle to identify enrichment of transcription factor peaks in scATAC-seq cluster-specific peaks. By default giggle is installed in MAESTRO environment. The giggle index for Cistrome database can be download here. Users need to download the file and provide the location of the giggle annation to MAESTRO when using the ATACAnnotateTranscriptionFactor function.

Usage

usage: MAESTRO [-h] [-v]
               {scrna-init,scatac-init,integrate-init,mtx-to-h5,count-to-h5,merge-h5,scrna-qc,scatac-qc,scatac-peakcount,scatac-genescore}

There are ten functions available in MAESTRO serving as sub-commands.

Subcommand Description
scrna-init Initialize the MAESTRO scRNA-seq workflow.
scatac-init Initialize the MAESTRO scATAC-seq workflow.
integrate-init Initialize the MAESTRO integration workflow.
mtx-to-h5 Convert 10X mtx format matrix to HDF5 format.
count-to-h5 Convert plain text count table to HDF5 format.
merge-h5 Merge multiple HDF5 files, e.g. different replicates.
scrna-qc Perform quality control for scRNA-seq gene-cell count matrix.
scatac-qc Perform quality control for scATAC-seq peak-cell count matrix.
scatac-peakcount Generate peak-cell binary count matrix.
scatac-genescore Calculate gene score based on the binarized scATAC peak count.

Example for running MAESTRO can be found at the follwing galleries. Please use MAESTRO COMMAND -h to see the detail description for each option of each module.

Galleries & Tutorials (click on the image for details)


Citation

About

Single-cell Transcriptome and Regulome Analysis Pipeline

License:GNU General Public License v3.0


Languages

Language:C 39.8%Language:C++ 22.9%Language:Roff 19.6%Language:Python 5.4%Language:R 2.9%Language:Makefile 2.3%Language:Shell 2.2%Language:HTML 2.2%Language:JavaScript 1.2%Language:Perl 0.5%Language:Ruby 0.3%Language:CSS 0.2%Language:M4 0.2%Language:Terra 0.2%Language:Dockerfile 0.1%Language:Scilab 0.0%Language:Batchfile 0.0%