A flexible pipeline for short read alignment to a reference with extensive QC reporting.
SRAlign is a Nextflow pipeline for aligning short reads to a reference.
SRAlign is designed to be highly flexible by allowing for the easy addition of tools to the pipeline as well as serving as a starting point for genomic analyses that rely on alignment of short reads to a reference.
- Trim reads
- QC of reads
- Raw reads FastQC
- Trim reads FastQC
- Summary MultiQC
- Align reads
- Align to reference genome/transcriptome
- Check contamination
- Preprocess alignments
- Mark duplicates
- Compress sam to bam
- Index bam
- QC of alignments
- samtools stats
- Samtools index stats
- Percent duplicates
- Percent aligned to contamination reference
- Summary MultiQC
- Library complexity and reproducibility
- Preseq library complexity
- DeepTools correlation
- DeepTools PCA
- Full pipeline MultiQC
-
Any POSIX compatible system (e.g. Linux, OS X, etc) with internet access
- Run on Windows with Windows Subsystem for Linux (WSL). WSL2 highly recommended.
-
Nextflow version >= 21.04
- See Nextflow Get started for prerequisites and instructions on installing and updating Nextflow.
-
- I recommend Docker Desktop for OS X or Windows users
-
Download or update
SRAlign
:- Downloads the project into
$HOME/.nextflow/assets
- Useful for quickly downloading and easily running a project.
- Allows for accessing
SRAlign
using Nextflow command by simply referring totrev-f/SRAlign
without having to refer to the location ofSRAlign
in the system. - To customize or expand
SRAlign
, see the documentation on customizing or expandingSRAlign
.
- Allows for accessing
nextflow pull trev-f/SRAlign
- Downloads the project into
-
Show project info:
nextflow info trev-f/SRAlign
-
Check that
SRAlign
works on your system:-profile test
uses preconfigured test parameters to runSRAlign
in full on a small test dataset stored in a remote GitHub repository.- Because these test files are stored in a remote repository, internet access is required to run the test.
- For more information, see the
profiles
section of the nextflow config file and trev-f/SRAlign-test.
nextflow run trev-f/SRAlign -profile test
-
Prepare the input design csv file.
- Input design file must be in csv format with no whitespace.
- Either reads (fastq or fastq.gz) or alignments (bam) are accepted.
- If reads are supplied, can be paired or unpaired.
- Required columns:
- reads: lib_ID, sample_name, replicate, reads1, reads2 (optional)
- alignments: lib_ID, sample_name, replicate, bam, tool_IDs
- See sample inputs in the
SRAlign-test
repository. - A template project repository can be downloaded from the
SRAlign-template
repository.
-
Show all configurable options for
SRAlign
by showing a help message:- The most important information here is probably the list of available reference genomes.
nextflow run trev-f/SRAlign --help
-
Analyze your data with
SRAlign
:nextflow run trev-f/SRAlign -profile docker --input <input.csv> --genome <valid genome key>
SRAlign
is designed to be highly configurable, meaning that its default behavior can be changed by supplying any of a number of configurable parameters. These can be supplied in a number of ways that have a specific hierarchy of precedence.
- Show configurable parameters by showing command line help documentation:
nextflow run trev-f/SRAlign --help
- Nextflow arguments always begin with a single dash, e.g.
-profile
. - Pipeline parameters specified at the command line always begin with a double dash, e.g.
--input
.- Parameters specified at the command line always have the highest precedence. They will overwrite parameters specified in any config or params files.
- I recommend specifying required parameters (i.e.
--input
and--genome
) and up to a few others at the command line in this manner. Specifying more than this at the command line gets unwieldy.
- A custom config or parameters file is a good option for cases where you want to supply more parameters than can comfortably be done at the command line or you want to use the same custom parameters in multiple runs.
- For a config file, use the params scope
- For a JSON/YAML parameters file, see the Nextflow CLI docs.
Additional documentation can be found in docs.
Quick links: