About the project

We are trying to detect if there are changes in the expression of long non-coding RNAs between the left and right hemisphere of a mouse's brain (telencephalon). For this, 3 samples where taken and sequenced using RNAseq. For more information about RNAseq, see: https://galaxyproject.org/tutorials/rb_rnaseq/

The pipeline

The protocol is based on the following ones:

The pipeline consists of a series of scripts written in Python 3. The scripts are executed in a Sun Grid Engine grid, the job scheduler of the local cluster. Some scripts are autogenerated according to the kind of data in order to maximize paralellism and keep the scripts simple.

The workflow at the moment consists of the next steps:

Fetching of the data. First, we need to find the FASTQ files.

The script script.fetch_data.py is involved in this process.
Quality check of the reads using FastQC & MultiQC.

The script script.quality_check.py is involved in this process.
Mapping of the reads to the reference (Mus musculus) genome using HISAT2.

The script script.rnaseq_map.py is involved in this process.
Conversion of the SAM output to BAM using SAMTools.

The script script.sam_to_bam.py is involved in this process.

For more documentation on the scripts, look at the scripts themselves.

About

Pipeline to analyze expression data and quantify long non-coding RNAs in them as part of a project at the Institute of Neurobiology of the National Autonomous University of Mexico.

Languages

Language:Python 64.5%Language:Jupyter Notebook 35.5%