Bulk RNA-seq

Latch Verified

Produce transcript/count matrices from sequencing reads.

Hosted Interface · SDK Documentation · Slack Community

Workflow Anatomy

Disclaimer

This workflow assumes that your sequencing reads were derived from short-read cDNA sequencing ( as opposed to long-read cDNA/direct RNA sequencing). If in doubt, you can likely make the same assumption, as it is by far the most common form of "RNA-sequencing".

Brief Summary of RNA-seq

This workflow ingests short-read sequencing files (in FastQ format) that came from the following sequence of steps¹:

RNA extraction from sample
cDNA synthesis from extracted RNA
adaptor ligation / library prep
(likely) PCR amplification of library
sequencing of library

You will likely end up with one or more FastQ files from this process that hold the sequencing reads in raw text form. This will be the starting point of our workflow.

(If you have a .bcl file, this holds the raw output of a sequencing machine. There are there are external tools that can convert these files to FastQ format, which you will need before you can proceed).

Quality Control

As a pre-processing step, its important to check the quality of your sequencing files. FastQC is the industry staple for generating a report of useful summary statistics² and is available if you double-click on a file on the LatchBio platform.

The following are the most useful of these statistics:

Per base sequence quality gives the per-site distribution over the length of the read
Sequence duplication levels reveals duplicated reads, indicating degraded RNA samples or aggressive PCR cycling¹

For a full breakdown of the values and their interpretation, we refer the reader to this tutorial.

Trimming

Short-read sequencing introduces adapters, small sequences attached to the 5' and 3' end of cDNA fragments, that are present as artifacts in our FastQ files and must be removed.

We have yet to identify a comprehensive review of the various trimming tools to benchmark both accuracy and speed, so we have selected TrimGalore trusted by researchers we work with out of UCSF and Stanford, until we are able to do so ourself.

Alignment

Alignment is the process of assigning a sequencing read a location on a reference genome or transcriptome. It is the most computationally expensive step of the workflow, requiring a comparison against the entire reference sequence for each of millions of reads.

Transcript alignment was initially conducted similarly to genomic alignment, using tools like Bowtie2 to rigorously recover reference coordinates for each read. This was eschewed for a lighter "pseudo-alignment" in the years that followed that assigned each read to a transcript rather than an exact location, saving time and resources. However, while these methods are faster, they have proven to be less accurate.³

In 2020, the Selective Alignment algorithm was introduced that performed a similar lightweight read assignment while simultaneously outperforming traditional alignment methods in accuracy.³ We utilize salmon to implement selective alignment.

Gene Count Quantification

Selective Alignment produces estimations of transcript abundances. Recall that that there can be multiple transcripts for any single gene. It is desirable to have estimated gene counts for two reasons:

gene counts are a more stable measure of transcription.*
gene counts are more interpretable

* Stability is loosely defined as consistent correlation with ground truth counts as the available (transcript) annotations begin to drop out. ⁴

We utilize tximport to perform the conversion of transcripts to read counts.

Stark, Rory; Grzelak, Marta; Hadfield, James (2019). RNA sequencing: the teenage years. Nature Reviews Genetics, (), –. doi:10.1038/s41576-019-0150-2 ↩ ↩²
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ↩
Srivastava, A., Malik, L., Sarkar, H. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol 21, 239 (2020). https://doi.org/10.1186/s13059-020-02151-8 ↩ ↩²
Soneson C, Love MI and Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences [version 1; peer review: 2 approved]. F1000Research 2015, 4:1521 ↩

latch-verified / bulk-rnaseq

Bulk RNA-seq

Hosted Interface · SDK Documentation · Slack Community

Workflow Anatomy

Disclaimer

Brief Summary of RNA-seq

Quality Control

Trimming

Alignment

Gene Count Quantification

About

Languages

Bulk RNA-seq

Hosted Interface · SDK Documentation · Slack Community

Workflow Anatomy

Disclaimer

Brief Summary of RNA-seq

Quality Control

Trimming

Alignment

Gene Count Quantification

Footnotes

About

Languages