A program for accurate intron selection and for intron-level differential splicing from large collections of RNA-seq data (manuscript in preparation).
Copyright (C) 2017-2019, and GNU GPL v3.0, by Guangyu Yang, Liliana Florea
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Like its predecessor JULiP, JULiP2 is a high-performance Python package designed to select an accurate set of introns and, additionally, to perform intron-level differential splicing analysis from large collections of RNA-seq samples.
JULiP2 works on aligned RNA sequencing reads (generated by Tophat or STAR), using a generalized linear model to select a reliable subset of introns from among those extracted from the spliced alignments. JULiP2 assumes a negative binomial model for the intron-supporting read counts, in a maximum likelihood optimization problem.
- Estimates of differential isoform expression for single-end or paired-end RNA-Seq data;
- Expression estimates at the alternative splicing event (intron) level and at the entire gene level;
- Confidence intervals for expression estimates and quantitative measures of differential expression;
- Basic functionality for use on cluster / distributed computing system.
JULiP2 is written in Python and uses the Theano library at the backend. You can install the latest version from this GitHub repository. To download the codes, you can clone this repository by
git clone https://github.com/splicebox/JULiP.git
- Linux or Mac
- Python 2.7
- Theano, a Python library that define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays.
- numpy, a fundamental package for scientific computing with Python
- scipy, a Python-based package for mathematics, science, and engineering.
- statsmodels, a Python module for the estimation of different statistical models, conducting statistical tests and data exploration.
- pysam, a Python library for working with SAM/BAM files through samtools.
- pathos, a framework for heterogenous computing.
If you are using pip, install the packages with commands:
pip2 install --user theano numpy scipy pysam pathos intervaltree
pip2 install --user statsmodels
- samtools, for accessing SAM/BAM files
Usage: python run.py [options] --bam-file-list bam_file_list.txt
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--bam-file-list=BAM_FILE_LIST
bam file list
--annotation=ANNOTATION_FILE
path of annotation file (.gtf)
--out-dir=OUT_DIR output directory (default: out)
--seq-name=SEQ_NAME specify sequence or chromosome name, None for whole
sequences.
--mode=MODE JULiP processing mode ("differential-analysis" or
"intron-detection").
--threads=THREADS number of data processing thread. (default: 1)
The main input of JULiP2 is a list of BAM files with RNA-Seq read mappings.
As an option, the BAM file can be sorted by genomic location and indexed for random access.
samtools sort -o accepted_hits.sorted.bam accepted_hits.bam
samtools index accepted_hits.sorted.bam
REF="path_to_gtf_file"
BAM_LIST="path_to_bam_file_list"
python run.py --bam-file-list $BAM_LIST \
--mode 'differential-analysis' \
--threads 10 \
--annotation $REF
python run.py --bam-file-list $BAM_LIST \
--mode 'intron-detection' \
--threads 10 \
--annotation $REF
Contact: gyang22@jhu.edu, florea@jhu.edu
See the file LICENSE for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES.