lanl001 / halc

High Throughput Algorithm for Long Read Error Correction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


The HALC paper is accepted for publication in BMC Bioinformatics!


HALC is software that makes error correction for long reads with high throughput.

Copy right

HALC is under the Artistic License 2.0.

Short manual

  1. System requirements

    HALC is suitable for 32-bit or 64-bit machines with Linux operating systems. At least 4GB of system memory is recommended for correcting larger data sets.

  2. Installation

    Aligner BLASR and error correction software LoRDEC (only for -ordinary mode) are required to run HALC.

    • The source files in 'src' and 'thirdparty' folders can be compiled to generate a 'bin' folder by running Makefile: make all.
    • Put BLASR, LoRDEC and the 'bin' folder to your $PATH: export PATH=PATH2BLASR:$PATH , export PATH=PATH2LoRDEC:$PATH and export PATH=PATH2bin:$PATH, respectively.
  3. Inputs

    • Long reads in FASTA format.
    • Contigs assembled from the corresponding short reads in FASTA format.
    • The initial short reads in FASTA format (only for -ordinary mode; obtained with cat left_reads.fa >short_reads.fa and then cat right_reads.fa >>short_reads.fa).
  4. Using AlignGraph long_reads.fa contigs.fa [-options|-options]

    Options (default value):
    -o/-ordinary short_reads.fa (yes)
    Ordinary mode utilizing repeats to make correction. The error correction software LoRDEC and the initial short reads are required to refine the repeat corrected regions. It is exclusive with the -repeat-free option.
    -r/-repeat-free (no)
    Repeat-free mode without utilizing repeats to make correction. It is exclusive with the -ordinary option.
    -b/-boundary n (4)
    Maximum boundary difference to split the subcontigs.
    -a/-accurate (yes)
    Accurate construction of the contig graph.
    -c/-coverage n (auto)
    Expected coverage on contigs. If not specified, it can be automatically calculated.
    -w/-width n (4)
    Maximum width of the dynamic programming table.
    -k/-kmer n (25)
    Kmer length for LoRDEC refinement.
    -t/-threads n (auto)
    Number of threads for one process to create. It is automatically set to the number of computing cores.
    -l/-log (no)
    System log to print.

  5. Outputs

    • Error corrected full long reads.
    • Error corrected trimmed long reads.
    • Error corrected split long reads.

Chinese name

HALC's Chinese name is 浩克.


High Throughput Algorithm for Long Read Error Correction


Language:C++ 89.5%Language:Python 9.6%Language:Makefile 0.9%