This is a fork of DRAP v1.91: http://www.sigenae.org/drap adapted to the cluster of the Station Biologique de Roscoff.
Modifications are quick and dirty and not intensively tested.
At the moment only runMeta
works correctly with sge
set in the cfg/drap.cfg
file . Everything else must use local
option (but can be submitted as a job with a qsub).
License: GNU GPLv3
Short read RNASeq de novo assembly is a well established method to study transcription of organisms lacking a reference genome sequence. Available software packages such as Trinity and Oases have proven to be able to build high quality contigs from short reads. But there is still room for improvement on different points such as:
- compactness: they often produce different contigs which are included in one another or overlapping one another,
- chimerism: the contigs contain different kinds on chimera such as duplicated open reading frames,
- substitution, insertion, deletion errors: the consensus sequences build by the assembler contain errors which can be partly corrected using the read alignments.
DRAP includes three modules:
- runDrap chains an Oases or Trinity assembly of reads from a given sample with several compaction and correction steps. It produces several assembly files with different FPKM threshold for total contigs or contigs comprising an open reading frame. A report file presents the resulting assembly and alignment metrics.
- runMeta gathers all the samples assemblies and fusions the results in a unique representative contig set. It also removes the redundancy between sets and produces a general reports including assembly and alignment metrics.
- runAssessment processes different contigs sets build from the same read sets to generate assembly and alignment metrics which are collected in report. It helps to choose the best assembly.
Go to the original install page: http://www.sigenae.org/drap/install.html
-
Programming languages interpreters and modules
- bash
- csh
- perl 5.. with non standard modules:
- Bio::Search::Hit::GenericHit
- Bio::Search::Tiling::MapTiling
- Bio::SearchIO::Writer::HSPTableWriter
- Bio::SeqIO
- Bio::Tools::Run::StandAloneBlast
- IPC::Run
- JSON
- List::Util
- Term::ANSIColor
- python 2.7.* with non standard modules:
- Bio
- NumPy
- SciPy
-
External softwares
- bedtools >= 2.22.1
- blat >= 35
- bowtie >= 2.2.9
- busco >= 3.0
- bwa >= 0.7.15
- cd-hit >= 4.6
- cutadapt >= 1.8.3
- dc >= 1.3.95 (in bc >=1.06)
- exonerate >= 2.2.0
- express >= 1.5.1
- fastq_illumina_filter >= 0.1
- getorf >= EMBOSS: 6.4.0.0
- khmer >= 2.0
- NCBI Blast+ >= 2.2.29
- NCBI Tools >= 12.0.0
- oases >= 0.2.06
- parallel >= parallel-20141022
- rsync >= 3.0.6
- samtools >= 1.3
- seqclean
- STAR >= 2.4.0i
- tgicl < 2 -> yes, it is not tgicl >= 2.1
- TransDecoder >= 2.0.1
- TransRate >= 1.0.1
- trim_galore >= 0.4.0
- Trinity >= 2.4.0
Details about how those softwares are used can be see in doc/third_party_tools.html
see the doc/install.html and doc/quick_start.html pages.
You must use a command similar to the following:
#!/bin/bash
DRAP_PATH="/usr/local/genome2/drap"
WORKING_DIR="$(pwd)"
OUT_FOLDER="$WORKING_DIR"
$DRAP_PATH/runMeta \
--cfg-file $WORKING_DIR/cfg/drap.cfg \
--drap-dirs $OUT_FOLDER/trinity_splA,$OUT_FOLDER/trinity_splB \
--ref $DRAP_PATH/test/data/Danio_rerio.pep.fasta \
--outdir $OUT_FOLDER/meta_trinity \
where:
--drap-dirs
are the folders obtained from runDrap
.
Each of those folders must contains at least le following contents in order to successfully run runMeta
:
.drap_conf.json
(used in the steps06-meta_index.sh
,07-meta_rmbt.sh
and09-meta_postprocess.sh
ofrunMeta
): a json file containing at least the following elements:
{
"alignR1" : [
"/path/to/sampleA_R1.fastq.gz"
],
"alignR2" : [
"/path/to/sampleA_R2.fastq.gz"
],
"coverages" : [
"1",
"3",
"5",
"10"
],
"paired" : 1,
"strand" : null
}
transcripts_fpkm_X.fa
(used in01-meta_merge.sh
): which is the file of transcripts to be mapped. TheX
intranscripts_fpkm_X.fa
name must be the minimal value in the list associated to the "coverages" key in the file.drap_conf.json
. The value ofX
is the coverage cutoff used byexpress
when transcripts (fromtranscripts_fpkm.fa
) are filtered.
Notes:
- The
alignR2
key can be ommited when thepaired
key is set to 0. - If we set, for example,
"coverages" : ["2"]
andtranscripts_fpkm_2.fa
, the file.drap_conf.json
produced byrunMeta
will put back"coverages" : ["1", "3", "5", "10"]
.