A curated list of awesome Bioinformatics software and libraries. Mostly command line based, and free or open-source. Please feel free to contribute!
Program | Description |
---|---|
datamash | Data transformations and statistics. |
Bioinformatics One Liners | Git repo of useful single line commands. |
CSVKit | Utilities for working with CSV/Tab-delimited files. |
Bedtools2 | a swiss army knife for genome arithmetic |
Program | Description |
---|---|
bcbio-nextgen | Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. |
Sequence Processing includes tasks such as demultiplexing raw read data, and trimming low quality bases.
- Fastqp - Fastq and Sam quality control using python.
- FastQC - A quality control tool for high throughput sequence data.
- Fastx Tookit - FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities.
- Seqtk - Toolkit for processing sequences in FASTA/Q formats.
De Novo Alignment
DNA Resequencing
Program | Description |
---|---|
BWA | Burrow-Wheeler Aligner for pairwise alignment between DNA sequences. |
Program | Description |
---|---|
samtools/bcftools/htslib | A suite of tools for manipulating next-generation sequencing data. |
freebayes | Bayesian haplotype-based polymorphism discovery and genotyping. |
- Bamtools - Collection of tools for working with bam files.
- vcflib - A C++ library for parsing and manipulating VCF files.
- bcftools - Set of tools for manipulating vcf files.
- vcftools - VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst)
Genomic Traits are differences in terms of DNA structure or content observed among populations that may be regulated by genetic variation. For example, telomere length or rDNA copy number.
- Telseq - Telseq is a tool for estimating telomere length from whole genome sequence data.
- bam toolbox MtDNA:Nuclear Coverage; Bam Toolbox can output the ratio of MtDNA:nuclear coverage, a proxy for mitochondrial content.
Program | Description |
---|---|
wgsim | Comes with samtools! - Reads simulator |
Bam Surgeon | Tools for adding mutations to existing .bam files, used for testing mutation callers. |
Program | Description |
---|---|
SIFT | Predicts whether an amino acid substitution affects protein function |
SNPeff | Genetic variant annotation and effect prediction toolbox. |
- cruzdb - Pythonic access to the ucsc genome database.
- pyfaidx - Pythonic access to fasta files.
- pyBedTools - Python wrapper for bedtools.
- pysam - Python wrapper for samtools.
- pyVCF - A VCF Parser for python.
- cyvcf - A port of pyVCF using Cython for speed.
- cyvcf2 - cython + htslib == fast VCF parsing; Even faster parsing than pyVCF.
The following tools can be used to visualize genomic data or for constructing customized visualizations of genomic data including sequence data from DNA-Seq, RNA-Seq, and ChIP-Seq, variants, and more.
- biodalliance - Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
- IGV - Java based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.
- Island Plot - d3 javascript based genome viewer. Constructs SVGs.
- pileup.js - javascript library that can be used to generate interactive and highly customizable web-based genome browsers.
- scribl - javascript library for drawing canvas-based gene diagrams. The Homepage has examples.