HaploKit / CrossHyLight

CrossHyLight is a strain aware de novo assembly method based on the overlap-layout-consensus (OLC) paradigm that leverages the strengths of NGS and 3rd generation sequencing to rapidly and accurately assemble highly complex metagenomic sequencing data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CrossHyLight

CrossHyLight is a strain aware de novo assembly method based on the overlap-layout-consensus (OLC) paradigm that leverages the strengths of NGS and 3rd generation sequencing to rapidly and accurately assemble highly complex metagenomic sequencing data.

HiStrain_workflow

The workflow of CrossHyLight.Broadly speaking, there are three main steps. First, an overlap graph is built using long reads, then the graph is optimized into a strain-aware graph. This graph is used to assemble long read contigs at strain-resolve. Next, short reads are aligned to the long read contigs and any short reads that align to assembled regions are removed. The remaining short reads undergo strain-aware assembly to produce short read contigs. Finally, the long read contigs and short read contigs are together used to construct a contig graph for further scaffolding and extension of the contigs into final master contigs.

Installation and dependencies

Please note that CrossHyLight is built for linux-based systems and python3 only. CrossHyLight relies on the following dependencies: CrossHyLight relies on the following dependencies:

To install CrossHyLight, firstly, it is recommended to intall the dependencies through Conda:

conda create -n CrossHyLight
conda activate CrossHyLight
conda install -c bioconda python=3.6 scipy pandas minimap2 bfc fmlrc2 ropebwt2 miniasm racon

Subsequently, pull down the code to the directory where you want to install:

git clone https://github.com/kangxiongbin/CrossHyLight.git
cd CrossHyLight

Examples

Illumina miseq and ONT reads. The out_folder must give the full path.

python ../script/CrossHyLight.py -l long_reads.fq -s short_reads.fq --nsplit 100 -t 30  -o out_folder

The input file must be in interleaved FASTQ format. Since the final clustering step retrieves and groups reads based on their names, read names should not contain spaces. The read file should be formatted like this:

@S0R0/1
TATAAGTAAGGCGTTGCGAGCGGGTCGTAAAATATTTTTGATCCGT
+
EEEEEGEDJHJ3JHKJMMMLLLKNGOOLLNLOOOMJONLOOIOLMO
@S0R0/2
TTGATTATCATGCCGGAAGTGCTGCTCTTGTTCTCTGAAAGAGAAT
+
EEEGEHHHJHFJJJJBML2MMLNLLONNLNLOLJONOLNONNNMNF

About

CrossHyLight is a strain aware de novo assembly method based on the overlap-layout-consensus (OLC) paradigm that leverages the strengths of NGS and 3rd generation sequencing to rapidly and accurately assemble highly complex metagenomic sequencing data.

License:MIT License


Languages

Language:C++ 73.7%Language:Python 25.2%Language:C 0.4%Language:Makefile 0.4%Language:Shell 0.4%