zephyrzilla / Awesome-Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome-Bioinformatics

A curated list of awesome Bioinformatics software and libraries. Mostly command line based, and free or open-source. Please feel free to contribute!

Data Processing

Command Line Utilities

Program Description
datamash Data transformations and statistics.
Bioinformatics One Liners Git repo of useful single line commands.
CSVKit Utilities for working with CSV/Tab-delimited files.
Bedtools2 a swiss army knife for genome arithmetic

Next Generation Sequencing

Pipelines

Program Description
bcbio-nextgen Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.

Sequence Processing

Sequence Processing includes tasks such as demultiplexing raw read data, and trimming low quality bases.

  • Fastqp - Fastq and Sam quality control using python.
  • FastQC - A quality control tool for high throughput sequence data.
  • Fastx Tookit - FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities.
  • Seqtk - Toolkit for processing sequences in FASTA/Q formats.

Sequence Alignment

De Novo Alignment

DNA Resequencing

Program Description
BWA Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.

Variant Calling

Program Description
samtools/bcftools/htslib A suite of tools for manipulating next-generation sequencing data.
freebayes Bayesian haplotype-based polymorphism discovery and genotyping.

BAM File Utilities

  • Bamtools - Collection of tools for working with bam files.

VCF File Utilities

  • vcflib - A C++ library for parsing and manipulating VCF files.
  • bcftools - Set of tools for manipulating vcf files.
  • vcftools - VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst)

Genomic Traits

Genomic Traits are differences in terms of DNA structure or content observed among populations that may be regulated by genetic variation. For example, telomere length or rDNA copy number.

  • Telseq - Telseq is a tool for estimating telomere length from whole genome sequence data.
  • bam toolbox MtDNA:Nuclear Coverage; Bam Toolbox can output the ratio of MtDNA:nuclear coverage, a proxy for mitochondrial content.

Variant Simulation

Program Description
wgsim Comes with samtools! - Reads simulator
Bam Surgeon Tools for adding mutations to existing .bam files, used for testing mutation callers.

Variant Filtering / Quality Control

Variant Prediction/Annotation

Program Description
SIFT Predicts whether an amino acid substitution affects protein function
SNPeff Genetic variant annotation and effect prediction toolbox.

Python Modules

  • cruzdb - Pythonic access to the ucsc genome database.
  • pyfaidx - Pythonic access to fasta files.
  • pyBedTools - Python wrapper for bedtools.
  • pysam - Python wrapper for samtools.
  • pyVCF - A VCF Parser for python.
  • cyvcf - A port of pyVCF using Cython for speed.
  • cyvcf2 - cython + htslib == fast VCF parsing; Even faster parsing than pyVCF.

Visualization

Genome Browsers / Gene diagrams

The following tools can be used to visualize genomic data or for constructing customized visualizations of genomic data including sequence data from DNA-Seq, RNA-Seq, and ChIP-Seq, variants, and more.

  • biodalliance - Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
  • IGV - Java based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.
  • Island Plot - d3 javascript based genome viewer. Constructs SVGs.
  • pileup.js - javascript library that can be used to generate interactive and highly customizable web-based genome browsers.
  • scribl - javascript library for drawing canvas-based gene diagrams. The Homepage has examples.

About

A curated list of awesome Bioinformatics libraries and software.