terrycojones / diamond

Accelerated BLAST compatible local sequence aligner.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:

  • Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
  • Frameshift alignments for long read analysis.
  • Low resource requirements and suitable for running on standard desktops or laptops.
  • Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

Keep posted about new developments by following me on Twitter.

Build Status image image image

Quick start guide

Please read the manual for detailed installation and usage instructions. This demonstrates a quick example for setting up and using the program on Linux.

Installing the software on your system may be done by downloading it in binary format for immediate use:

wget http://github.com/bbuchfink/diamond/releases/download/v0.9.22/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz

The extracted diamond binary file should be moved to a directory contained in your executable search path (PATH environment variable).

To now run an alignment task, we assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

$ diamond makedb --in nr.faa -d nr

This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this:

$ diamond blastx -d nr -q reads.fna -o matches.m8

The output file here is specified with the –o option and named matches.m8. By default, it is generated in BLAST tabular format.

Note:

  • The program may use quite a lot of memory and also temporary disk space. Should the program fail due to running out of either one, you need to set a lower value for the block size parameter -b (see the manual).
  • The default (fast) mode was mainly designed for short reads. For longer sequences, the sensitive modes (options --sensitive or --more-sensitive) are recommended.
  • The runtime of the program is not linear in the size of the query file and it is much more efficient for large query files (> 1 million sequences) than for smaller ones.
  • Low complexity masking is applied to the query and reference sequences by default. Masked residues appear in the output as X.
  • The default e-value cutoff of DIAMOND is 0.001 while that of BLAST is 10, so by default the program will search a lot more stringently than BLAST and not report weak hits.

About

DIAMOND is developed by Benjamin Buchfink at the Detlef Weigel lab, Max Planck Institute for Developmental Biology, Tübingen, Germany.

[Email] [Twitter] [Google Scholar] [MPI-EBIO]

Publication:

  • Buchfink B, Xie C, Huson DH, "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176

About

Accelerated BLAST compatible local sequence aligner.

License:GNU General Public License v3.0


Languages

Language:C++ 84.7%Language:C 14.7%Language:CMake 0.3%Language:Shell 0.2%Language:Dockerfile 0.0%