haplotype-reconstruction

Benchmarking of tools for viral haplotype reconstruction.

WARNING: under active construction! May not work as appears, but in such an event feel free to file an issue.

Use

This repo can be used to test a variety of tools and settings to build pipelines for viral haplotype/quasispecies reconstruction on simulated, in vitro and in vivo data.

Haplotyper	Implemented	Paper	Code
QuasiRecomb	Yes	Link	Link
aBayesQR	Yes	Link	Link
SAVAGE	Yes	Link	Link
RegressHaplo	Yes	Link	Link
Haploclique	Yes	Link	Link
SHORAH	No	Link	Link
PredictHaplo	No	Link	Link
QSdpR	No	Link	Link
TenSQR	No	Link	Link

A standard invocation will be of the form:

snakemake output/$DATASET/$QUALITY_CONTROL/$READ_MAPPER/$GENE/$HAPLOTYPER/haplotypes.fasta

where the variables can take one of the following values:

Dataset

For more information, see the Data section below.

LANL simulations: see keys of simulations.json for potential simulation names
Reconstruction: see filenames in reconstruction directory of input data
Evolution: see filenames in evolution directory of input data
Compartmentalization: see entries in compartmentalization.json in root directory

Quality Control

qfilt, fastp, trimmomatic

Read mapper

bealign, bowtie2, bwa

Gene

env, gag, int, nef, pol, pr, prrt, rev, rt, tat, vif, vpr

Haplotyper

quasirecomb, abayesqr, savage, regresshaplo

Use of snakemake permits running on TORQUE, i.e.

snakemake --cluster 'qsub -o ./logs -e ./logs -V -d `pwd` -l nodes=1:ppn=$PPN' -j $JOBS -k target

Data

Input directory structure

├── compartmentalization
│   ├── $PATIENT_ID/$DATE/$COMPARTMENT/$REPLICATE/reads.fasta
│   ├── $PATIENT_ID/$DATE/$COMPARTMENT/$REPLICATE/scores.qual
├── evolution
│   ├── ERS661087.fastq
│   ├── ERS661088.fastq
│   ├── ERS661089.fastq
│   ├── ERS661090.fastq
│   ├── ERS661091.fastq
│   ├── ERS661092.fastq
│   └── ERS661093.fastq
├── LANL-HIV-aligned.fasta
├── LANL-HIV.fasta
├── LANL-HIV.new
├── README.md
├── reconstruction
│   ├── 3.GAC.454Reads.fna
│   ├── 3.GAC.454Reads.qual
│   ├── 93US141_100k_14-159320-1GN-0_S16_L001_R1_001.fastq
│   ├── 93US141_100k_14-159320-1GN-0_S16_L001_R2_001.fastq
│   ├── BP_050100753.fasta
│   ├── BP_050100753.qual
│   ├── FiveVirusMixIllumina_1.fastq
│   ├── FiveVirusMixIllumina_2.fastq
│   ├── PP1L_S45_L001_R1_001.fastq
│   ├── PP1L_S45_L001_R2_001.fastq
│   ├── regress_haplo.bam
│   ├── regress_haplo.bam.bai
│   ├── sergei1.fastq
│   ├── sergei2.fastq
│   ├── SRR961514-Illumina.sra
│   ├── SRR961596-454.fastq
│   ├── SRR961596-454.sra
│   ├── SRR961669-PacBio.fastq
│   └── SRR961669-PacBio.sra
└── references
    ├── env.fasta
    ├── gag.fasta
    ├── int.fasta
    ├── nef.fasta
    ├── pol.fasta
    ├── pr.fasta
    ├── prrt.fasta
    ├── rev.fasta
    ├── rt.fasta
    ├── tat.fasta
    ├── vif.fasta
    ├── vpr.fasta
    └── vpu.fasta

Description

LANL

LANL-HIV.fasta
LANL-HIV-aligned.fasta
LANL-HIV.new

HIV genomes from the LANL database, as well as an alignment built with mafft and a tree built with FastTree. Used for simulation.

Intrahost evolution

evolution/ERS6610*.fastq

NGS read data from a study on HIV intra-host evolution.

454 data

reconstruction/BP_050100753.fasta
reconstruction/BP_050100753.qual

ACME lab 454 data which shows a clear signal of segregating haplotypes.

HIV benchmarking

reconstruction/SRR961514-Illumina.fastq
reconstruction/SRR961596-454.fastq
reconstruction/SRR961669-PacBio.fastq

A gold standard dataset, consisting of mixed, known strains at known proportions.

Sergei

reconstruction/sergei1.fastq
reconstruction/sergei2.fastq

A set of paired end reads given by Sergei.

RegressHaplo test set

reconstruction/regress_haplo.bam
reconstruction/regress_haplo.bam.bai

Dataset that comes with the RegressHaplo code.

Reference genes

references/*.fasta

HXB2 genes to be used as references when aligning reads.

Installation

Requirements

Linux (tested on Ubuntu 18.04.1, CentOS 7)
conda (tested on 4.6.10) with standard BioConda channels

Further requirements listed in environment.yml.

Install

conda env create -f environment.yml
conda activate haplotype-reconstruction

hannahkimincompbio / haplotype-reconstruction