Workflow for Integration of RNAseq and QTL Analyses in Collaborative Cross Mice
This workflow is designed for automated integration analysis of RNA-seq and QTL (genotype/phenotype) data. The RNA-seq part is optimized for Illumina TruSeq Technology, so if the scripts are run on default parameters the user is required to input paired – end (forward – reverse) stranded Illumina reads.
In order to run the scripts the user is required to:
The user is required to download and install the Docker engine. If you are unfamiliar with Docker please visit Docker
After the Docker engine installation the user is required to pull the images from enios/rnaseq-qtl Docker Hub repository. The pull command should be something like:
$ sudo docker pull enios/rnaseq-qtl:<tag>
Where <tag>
is the name of the specific tool needed for the workflow to run properly.
- enios/rnaseq-qtl:trim-galore
- enios/rnaseq-qtl:hisat2
- enios/rnaseq-qtl:featurecounts
- enios/rnaseq-qtl:edger
- enios/rnaseq-qtl:happy
If you are unfamiliar with Docker please clone the dockerPull.py script locally and run:
$ sudo python /path/to/script/dockerPull.py
In order for the workflow to run the user is required to create a directory with the following files and folders
(3) hisat2_index: FOLDER where the index for HISAT2 is stored, if you need to download a specific index please visit the HISAT 2 index repository (ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data)
(6) CONDENSED: FOLDER where the condensed genotype data are stored (HAPPY input, please visit http://mtweb.cs.ucl.ac.uk/mus/www/preCC/MEGA_MUGA/Mar2015.MEGA+MDA+MUGA/CONDENSED/)
The scripts should be downloaded and run from inside the same directory (core.py is called as a module inside wrapper.py and needs to be in the same directory):
$ cd /path/to/scripts/
$ sudo python wrapper.py <arguments>
The command line arguments are divided into two categories. The Basic arguments that start the workflow using default parameters and the Advanced (optional) arguments that modify each tool according to the individual needs of the researcher.
There are two (2) basic arguments the user is required to input in order for the scripts to run on default parameters:
a. Path to files/folders: the absolute path to the directory containing the folders and data for the RNA-seq – QTL workflow (see 1.3. Prepare the working directory)
For example, if the user has set up a working directory named “Experiment1” and the input files are FASTQ, the command should be something like:
$ sudo python wrapper /home/$user/Experiment1/ fastq
The script must be run with sudo privileges (on Linux distributions) in order for the docker daemon to launch the containers.
If the user needs to modify the tool – specific arguments, he/she is required to input a JSON file (please see the JSON folder for more information) as an extra argument. For example, if the user has set up a working directory named “Experiment1” and the input files are FASTQ, the command should be something like:
$ sudo python wrapper /home/$user/Experiment1/ fastq parameters.json
If the script is run with the correct arguments, the user will be prompt to input the following parameters:
HISAT2 index: The basename of the index for the reference genome. The basename is the name of any of the index files up to but not including the final. For example if the index is stored in the working directory inside a folder named hisat2_index, the input should be hisat2_index/mm10/mm10.
GTF File: the GTF file name for the appropriate genome, for example Mus_musculus.GRCm38.91.gtf.
Factors: A string containing factors, for example if there are 6 biological replicates, 3 Normal and 3 Case the input should be Normal,Normal,Normal,Case,Case,Case (make sure that the order of the factors corresponds to the appropriate replicate).
Contrast of interest: A string containing the contrast of interest, following the previous example the input should be Normal-Case (or Case-Normal, depending on the contrast the user needs to test).
Phenotype Name: A string specifying the phenotype name in the data.txt file. For example DB0123.
Input File: A string specifying the input phenotype file for the QTL analysis. For example data.txt (the path to the file is not required, provided that it is inside the working directory as described in 1.3. Prepare the working directory).
If the user wishes to assess the CI of the QTL peak from the analysis performed in step 3.4, he/she is required to run the HAPPY Docker container with the following command:
$ sudo docker run -v <path/to/working/dir/>:/tmp enios/rnaseq-qtl:happy Rscript /mnt/simlocus.dock.R /tmp/CONDENSED <Chromosome> <Locus> <Chrom_Number> <Phenotype_Name> /tmp/<Input_File>
Where:
Chromosome: A string specifying the chromosome containing the QTL peak from the QTL analysis. For example chr5 (the user is required to input the chr prefix)
Locus: An integer specifying the genetic locus of the QTL peak from the QTL analysis. For example 2235.
Chrom_Number: An integer or character specifying the chromosome containing the QTL peak from the QTL analysis. For example 5 or X.
Phenotype_Name: A string specifying the phenotype name in the data.txt file. For example DB0123.
Input_File: A string specifying the input phenotype file for the QTL analysis. For example data.txt (the path to the file is not required, see above – section 3.4)
- Fotakis Giorgos - QTL-RNAseq analysis workflow development, Docker containerization
- Binenbaum Ilona - QTL analysis
- Vlachavas Iason Efstathios - Rscripts development