edwwlui / JULiP

Intron selection and differential splicing from RNA-seq data collections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JULiP2

A program for accurate intron selection and for intron-level differential splicing from large collections of RNA-seq data (manuscript in preparation).

Copyright (C) 2017-2019, and GNU GPL v3.0, by Guangyu Yang, Liliana Florea

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Table of contents

What is JULiP2?

Like its predecessor JULiP, JULiP2 is a high-performance Python package designed to select an accurate set of introns and, additionally, to perform intron-level differential splicing analysis from large collections of RNA-seq samples.

JULiP2 works on aligned RNA sequencing reads (generated by Tophat or STAR), using a generalized linear model to select a reliable subset of introns from among those extracted from the spliced alignments. JULiP2 assumes a negative binomial model for the intron-supporting read counts, in a maximum likelihood optimization problem.

Features

  • Estimates of differential isoform expression for single-end or paired-end RNA-Seq data;
  • Expression estimates at the alternative splicing event (intron) level and at the entire gene level;
  • Confidence intervals for expression estimates and quantitative measures of differential expression;
  • Basic functionality for use on cluster / distributed computing system.

Installation

JULiP2 is written in Python and uses the Theano library at the backend. You can install the latest version from this GitHub repository. To download the codes, you can clone this repository by

git clone https://github.com/splicebox/JULiP.git

System requirement

  • Linux or Mac
  • Python 2.7

Required Python modules:

  • Theano, a Python library that define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays.
  • numpy, a fundamental package for scientific computing with Python
  • scipy, a Python-based package for mathematics, science, and engineering.  
  • statsmodels, a Python module for the estimation of different statistical models, conducting statistical tests and data exploration.  
  • pysam, a Python library for working with SAM/BAM files through samtools.  
  • pathos, a framework for heterogenous computing.

If you are using pip, install the packages with commands:

pip2 install --user theano numpy scipy pysam pathos intervaltree
pip2 install --user statsmodels

Other required software:

Usage

Usage: python run.py [options] --bam-file-list bam_file_list.txt

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --bam-file-list=BAM_FILE_LIST
                        bam file list
  --annotation=ANNOTATION_FILE
                        path of annotation file (.gtf)
  --out-dir=OUT_DIR     output directory (default: out)
  --seq-name=SEQ_NAME   specify sequence or chromosome name, None for whole
                        sequences.
  --mode=MODE           JULiP processing mode ("differential-analysis" or
                        "intron-detection").
  --threads=THREADS     number of data processing thread. (default: 1)

Input/Output

The main input of JULiP2 is a list of BAM files with RNA-Seq read mappings.
As an option, the BAM file can be sorted by genomic location and indexed for random access.

samtools sort -o accepted_hits.sorted.bam accepted_hits.bam
samtools index accepted_hits.sorted.bam

Example

Example: run differential analysis model:

REF="path_to_gtf_file"
BAM_LIST="path_to_bam_file_list"
python run.py --bam-file-list $BAM_LIST \              
              --mode 'differential-analysis' \
              --threads 10 \              
              --annotation $REF

Example: run intron detection model:

python run.py --bam-file-list $BAM_LIST \              
              --mode 'intron-detection' \
              --threads 10 \              
              --annotation $REF

Support

Contact: gyang22@jhu.edu, florea@jhu.edu

License information

See the file LICENSE for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES.

About

Intron selection and differential splicing from RNA-seq data collections

License:GNU General Public License v3.0


Languages

Language:Python 100.0%