HaploKit / StrainXpress

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StrainXpress

Description

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.

Installation and dependencies

Please note that StrainXpress is built for linux-based systems and python3 only. StrainXpress relies on the following dependencies: StrainXpress relies on the following dependencies:

To install strainxpress, firstly, it is recommended to intall the dependencies through Conda:

conda create -n strainxpress
conda activate strainxpress
conda install -c bioconda python=3.6 scipy pandas minimap2

Subsequently, pull down the code to the directory where you want to install, and compile the code:

git clone https://github.com/kangxiongbin/StrainXpress.git
cd StrainXpress
sh install.sh

Examples

Illumina miseq

python ../scripts/strainxpress.py -fq all_reads.fq

The input file must be interleaved FASTQ and format like below:

@S0R0/1
TATAAGTAAGGCGTTGCGAGCGGGTCGTAAAATATTTTTGATCCGT
+
EEEEEGEDJHJ3JHKJMMMLLLKNGOOLLNLOOOMJONLOOIOLMO
@S0R0/2
TTGATTATCATGCCGGAAGTGCTGCTCTTGTTCTCTGAAAGAGAAT
+
EEEGEHHHJHFJJJJBML2MMLNLLONNLNLOLJONOLNONNNMNF

When a data set is big, we recommend to use the fast cluster method:

python ../scripts/strainxpress.py -fq all_reads.fq -fast

- The result is in the stageb folder: final_contigs.fasta

Possible issues during installation (optional)

If g++ version of the system is not satisfied, one could try this to install:

conda install -c conda-forge gxx_linux-64=7.3.0
# replace the /path/to/ with your own path
ln -s /path/to/miniconda3/envs/strainxpress/bin/x86_64-conda-cos6-linux-gnu-g++ /path/to/miniconda3/envs/strainxpress/bin/g++
ln -s /path/to/miniconda3/envs/strainxpress/bin/x86_64-conda-cos6-linux-gnu-gcc /path/to/miniconda3/envs/strainxpress/bin/gcc

If boost library is not installed, you could try this to install:

conda install -c conda-forge boost
# set envionment variables
export LD_LIBRARY_PATH=/path/to/miniconda3/envs/strainxpress/lib/:$LD_LIBRARY_PATH
export CPATH=/path/to/miniconda3/envs/strainxpress/include/:$CPATH

If compile error occurs something like /path/to/miniconda3/envs/strainxpress/x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lboost_timer or cannot find -lgomp, which means it fails to link boost or libgomp library, one could try this to solve:

ln -s /path/to/miniconda3/envs/strainxpress/lib/libboost_* /path/to/miniconda3/envs/strainxpress/x86_64-conda_cos6-linux-gnu/lib/.
ln -s /path/to/miniconda3/envs/strainxpress/lib/libgomp* /path/to/miniconda3/envs/strainxpress/x86_64-conda_cos6-linux-gnu/lib/.
# then re-complile and install
sh install.sh

About

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.

License:GNU General Public License v3.0


Languages

Language:C++ 80.7%Language:Python 18.4%Language:C 0.4%Language:Makefile 0.4%Language:Shell 0.2%