yejg2017 / RNA-Seq-Teaching-O2

This repository has teaching materials for a 2 and 3-day Introduction to RNA-sequencing data analysis workshop using the O2 Cluster

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction to RNA-seq using high-performance computing (HPC)

Audience Computational Skills Prerequisites Duration
Biologists Beginner/Intermediate None 2-day workshop (~13 hours of trainer-led time)


This repository has teaching materials for a 2-day Introduction to RNA-sequencing data analysis workshop. This workshop focuses on teaching basic computational skills to enable the effective use of an high-performance computing environment to implement an RNA-seq data analysis workflow. It includes an introduction to shell (bash) and shell scripting. In addition to running the RNA-seq workflow from FASTQ files to count data, the workshop covers best practice guidlelines for RNA-seq experimental design and data organization/management.

These materials were developed for a trainer-led workshop, but are also amenable to self-guided learning.

Learning Objectives

  1. Understand the necessity for, and use of, the command line interface (bash) and HPC for analyzing high-throughput sequencing data.
  2. Understand best practices for designing an RNA-seq experiment and analysis the resulting data.

Workshop Schedule

The schedule for using the materials in a trainer-led workshop can be found here


Lessons Estimated Duration
Introduction to the shell 70 min
Searching and redirection in shell 45 min
Introduction to the Vim text editor 30 min
Shell scripts and for loops 75 min
Permissions and environment variables 50 min
Project and data organization 40 min
Introduction to High-Performance Computing for HMS-RC's O2 cluster 45 min
Introduction to RNA-seq and Library Prep 50 min
NGS Workflows and Data Standards 35 min
RNA-seq data QC with FastQC 55 min
RNA-seq Alignment with STAR 75 min
Assessing Alignment Quality 60 min
Generating a Count Matrix 75 min
Documenting Steps in the Workflow with MultiQC 30 min
Automating the RNA-seq workflow 60 min
Alternative workflows for analyzing RNA-seq data 15 min
Quantifying expression using alignment-free methods (Salmon) 75 min


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


This repository has teaching materials for a 2 and 3-day Introduction to RNA-sequencing data analysis workshop using the O2 Cluster


Language:HTML 70.6%Language:Shell 26.7%Language:CSS 2.7%