e-hutchins / CIRCexplorer

A combined strategy to identify circular RNAs (circRNAs and ciRNAs) (Zhang et al., Complementary Sequence-Mediated Exon Circularization, Cell (2014), 159:134-147)

Home Page:http://yanglab.github.io/CIRCexplorer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#CIRCexplorer

Build Status

CIRCexplorer is a combined strategy to identify junction reads from back spliced exons and intron lariats.

Version: 1.1.7

Last Modified: 2016-1-28

Authors: Xiao-Ou Zhang (zhangxiaoou@picb.ac.cn), Li Yang (liyang@picb.ac.cn)

Maintainer: Xu-Kai Ma (maxukai@picb.ac.cn)

Download the latest stable version of CIRCexplorer

To see what has changed in recent versions of CIRCexplorer, see the CHANGELOG.

FAQ

##A schematic flow shows the pipeline

pipeline

Notice

CIRCexplorer is now only a circular RNA annotating tool, and it parses fusion junction information from mapping results of other aligners. The result of circular RNA annotating is directly dependent on the mapping strategy of aligners. Different aligners may have different circular RNA annotations. CIRCexplorer is now only in charge of giving fusion junctions a correct gene annotation. Other functions and supports for more aligners are under tensive developments. Thanks for your supports and understanding!

##Prerequisites

###Software / Package

####TopHat or STAR

####Others

###RNA-seq

The poly(A)−/ribo− RNA-seq is recommended. If you want to obtain more circular RNAs, RNase R treatment could be performed.

###Aligner

CIRCexplorer was originally developed as a circular RNA analysis toolkit supporting TopHat & TopHat-Fusion. After version 1.1.0, it also supports STAR.

####TopHat & TopHat-Fusion

To obtain junction reads for circular RNAs, two-step mapping strategy was exploited:

  • Multiple mapping with TopHat
tophat2 -a 6 --microexon-search -m 2 -p 10 -G knownGene.gtf -o tophat hg19_bowtie2_index RNA_seq.fastq
  • Convert unmapped reads (using bamToFastq from bedtools)
bamToFastq -i tophat/unmapped.bam -fq tophat/unmapped.fastq
  • Unique mapping with TopHat-Fusion
tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search hg19_bowtie1_index tophat/unmapped.fastq

####STAR

To detect fusion junctions with STAR, --chimSegmentMin should be set to a positive value. For more details about STAR, please refer to STAR manual.

##Installation

1 Download CIRCexplorer

git clone https://github.com/YangLab/CIRCexplorer.git
cd CIRCexplorer

2 Install required packages

pip install -r requirements.txt

3 Install CIRCexplorer

python setup.py install

##Usage

CIRCexplorer.py 1.1.7 -- circular RNA analysis toolkits.

Usage: CIRCexplorer.py [options]

Options:
    -h --help                      Show this screen.
    --version                      Show version.
    -f FUSION --fusion=FUSION      TopHat-Fusion fusion BAM file. (used in TopHat-Fusion mapping)
    -j JUNC --junc=JUNC            STAR Chimeric junction file. (used in STAR mapping)
    -g GENOME --genome=GENOME      Genome FASTA file.
    -r REF --ref=REF               Gene annotation.
    -o PREFIX --output=PREFIX      Output prefix [default: CIRCexplorer].
    --tmp                          Keep temporary files.
    --no-fix                       No-fix mode (useful for species with poor gene annotations)

###Example

####TopHat & TopHat-Fusion

CIRCexplorer.py -f tophat_fusion/accepted_hits.bam -g hg19.fa -r ref.txt

####STAR

  • convert Chimeric.out.junction to fusion_junction.txt (star_parse.py was modified from STAR filterCirc.awk)
star_parse.py Chimeric.out.junction fusion_junction.txt
  • parse fusion_junction.txt
CIRCexplorer.py -j fusion_junction.txt -g hg19.fa -r ref.txt

###Note

Field Description
geneName Name of gene
isoformName Name of isoform
chrom Reference sequence
strand + or - for strand
txStart Transcription start position
txEnd Transcription end position
cdsStart Coding region start
cdsEnd Coding region end
exonCount Number of exons
exonStarts Exon start positions
exonEnds Exon end positions
  • hg19.fa is genome sequence in FASTA format.

  • You could use fetch_ucsc.py script to download relevant ref.txt (Known Genes, RefSeq or Ensembl) and the genome fasta file for hg19 or mm10 from UCSC.

fetch_ucsc.py human/mouse ref/kg/ens/fa out

Example (download hg19 RefSeq gene annotation file):

fetch_ucsc.py human ref ref.txt

##Output

See details in the example file

Field Description
chrom Chromosome
start Start of junction
end End of junction
name Circular RNA/Junction reads
score Flag to indicate realignment of fusion junctions
strand + or - for strand
thickStart No meaning
thickEnd No meaning
itemRgb 0,0,0
exonCount Number of exons
exonSizes Exon sizes
exonOffsets Exon offsets
readNumber Number of junction reads
circType 'Yes' for ciRNA, and 'No' for circRNA (before 1.1.0); 'circRNA' or 'ciRNA' (after 1.1.1)
geneName Name of gene
isoformName Name of isoform
exonIndex/intronIndex Index (start from 1) of exon (for circRNA) or intron (for ciRNA) in given isoform (newly added in 1.1.6)
flankIntron Left intron/Right intron

Note: The first 12 columns are in BED12 format.

##Citation

Zhang XO, Wang HB, Zhang Y, Lu X, Chen LL and Yang L. Complementary sequence-mediated exon circularization. Cell, 2014, 159: 134-147

##License

Copyright (C) 2014-2016 YangLab. See the LICENSE file for license rights and limitations (MIT).

About

A combined strategy to identify circular RNAs (circRNAs and ciRNAs) (Zhang et al., Complementary Sequence-Mediated Exon Circularization, Cell (2014), 159:134-147)

http://yanglab.github.io/CIRCexplorer

License:Other


Languages

Language:Python 100.0%