caojiabao / biomagician

Useful tools in microbiomics and metagenomics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

biomagician

Collection of papers and tools that are helpful for bioinformatic & biostatistic analysis.

Tutorials

Category Name Description Link
Training collection SIB A curated list of bioinformatics training material 861
Tutorial Python Tutorial Python Tutorial 862
Tutorial Provides the page sources, data and figures for entry-level bioinformatics tutorials for long-read data analysis

Containers

Category Name Description Link
Dockerfile Singularity in Docker The resulting Docker image can be used on any system with Docker to build Singularity images 710
Tutorial Singularity Containerization 711
Hub SingularityHub Encapsulation of Environments with Containers 712

Graph Databases aka Knowledgebases

Category Name Description Link
Graph Platform neo4j is a graph database management system 476, 477
LBD SemNet provides an adoptable method for efficient Literature-Based-Discovery (LBD) of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care 478
Graph Database Het Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes 479
Graph Database BioGraph an online service and a graph DB for querying and analyzing bioinformatics resources 481, 482
Graph Database edge2vec Learning Node Representation Using Edge Semantics" 483, 484
Graph Database NGLY1 Deficiency Knowledge Graph NGLY1 Deficiency Knowledge Graph, the reasoning context to support hypothesis discovery for NGLY1 Deficiency-CDDG 485, 486, 487
Graph Database StarPepDB is a Neo4j graph database resulting from an integration process by which data from a large variety of bioactive peptide databases are cleaned, standardized, and merged so that it can be released into an organized collection 488, 489
Knowledgebase NeXtProt is an integrative resource providing both data on human protein and the tools to explore these 557, 558
Graph Database Cayley is an open-source database for Linked Data. It is inspired by the graph database behind Google's Knowledge Graph (formerly Freebase) 559, 560
Tutorial Neo4j Importing CSV Files in Neo4j 791
Tutorial Neo4j Getting Started with Graph Embeddings in Neo4j 792
Graph Database BioCypher A proposal for a unifying framework to create knowledge graph databases for systems biomedicine 914, 915, 916

Databases

Category Name Description Link
WGS GTDB-Tk GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. It is computationally efficient and designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. 10, 11 112
16S pbHITdb PharmaBiome manually curated HITdb 12
HumanMicrobiome R Data curatedMetagenomicData Dataset that can be loaded into R which contains human microbiome data from several body sites 35 229 230
HMP 16S HMP16SData R/Bioconductor package to simplify access to and analysis of HMP 16S data 63 64
tRNA GtRNAdb The genomic tRNA database contains tRNA gene predictions made by tRNAscan-SE on complete or nearly complete genomes. Unless otherwise noted, all annotation is automated, and has not been inspected for agreement with published literature. 75
16S Database EzBioCloud 16S Unlike other public databases, EzBioCloud’s 16S database can be used for species-level identification of OTUs and is freely available for academic, not-for-profit purposes 90 91
WGS core gene database UBCG UBCG stands for the Up-to-date Bacterial Core Gene. It is a method and software tool for inferring phylogenetic relationship using bacterial core gene set that is defined by up-to-date bacterial genome database. 94 95
Mouse gut gene catalog iMGMC integrated Mouse Gut Metagenomic Catalog 98 99
WGS Paper 737 WGS from high-throughput culturomics 109 110
Database MicrobiomeDB A data-mining platform for interrogating microbiome experiments 113
Database MGnify Public Datasets of Metagenomic samples and 16S data of various clinical studies (UHGG,UHGP) 114, 115, 495, 496, 497, 603
Database dbBact Microorganisms Knowledge Database 117
Database BIGSI BIGSI can search a collection of raw (fastq/bam), contigs or assembly for genes, variant alleles and arbitrary sequence. It can scale to millions of bacterial genomes requiring ~3MB of disk per sample while maintaining millisecond kmer queries in the collection 124 125 126
Database GutCyc GutCyc is a publicly-available and licenced resource and portal providing pathway annotation data for environmental metagenomic samples derived from the metagenomic studies of the human gut. 134 135 136
Database miBC This collection includes all cultivable bacterial strains isolated from the intestine of mice (Mus musculus) that are publicly available to date. 141 142
Ortholog Database OrthoMCL DB is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. 148
Database KiMoSys Data repository for KInetic MOdels of biological SYstems 150
Database BiGG Database BiGG Models is a knowledgebase of genome-scale metabolic network reconstructions. BiGG Models integrates more than 70 published genome-scale metabolic networks into a single database with a set of stardized identifiers called BiGG IDs. Genes in the BiGG models are mapped to NCBI genome annotations, and metabolites are linked to many external databases (KEGG, PubChem, and many more). 151
Database embl_gems This is a collection of genome-scale models built for all reference and representative bacterial genomes of NCBI RefSeq (release 84) using CarveMe 160
Database BioCyc BioCyc is a collection of 14560 Pathway/Genome Databases (PGDBs), plus software tools for exploring them 168
Database MetaCyc MetaCyc is a curated database of experimentally elucidated metabolic pathways from all domains of life. MetaCyc contains 2666 pathways from 2960 different organisms 169
Database KEGG A set of annotation maps for KEGG assembled using data from KEGG 171 172
Database VMH The VMH database captures information on human and gut microbial metabolism and links this information to hundreds of diseases and nutritional data 176 177
Database ggkbase an online database that offers users several options for retrieving data of interest: by projects, names, description, by genome completion or class 189
Database Cohorts IGGdb integrated genomes from the gut microbiome and other environments 192 193
Database Genome Properties Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome 215 216 217 218
Database YANA a software tool for analyzing flux modes, gene-expression and enzyme activities 219
Database Clinical Trials is a database of privately and publicly funded clinical studies conducted around the world 222
Database microcontax R package of microclass: The consensus taxonomy for prokaryotes is a package of data sets designed to be the best possible for training taxonomic classifiers based on 16S rRNA sequence data 231 232
Database ExperimentHub ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. 252 253
Database curated MetagenomicData Bioconductor package with thousands of curated metagnome datasets based on the ExperimentHub publication 257, 258
Database Knomics-Biota Online service for exploratory analysis of human gut metagenomes 265 266
Database Terra Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. 267
Knowledgebase Grakn Grakn is an intelligent database: a knowledge graph engine to organise complex networks of data and make it queryable 268 269 270 271
Database HiMapDB HiMAP database contains more unique species and strains than any major database 272
Database HGTree an explicit evolutionary approach that is generally considered to be a reliable way to detect HGT 276 277
Database Pfam a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) 281
Database ISFinder provides a list of insertion sequences (IS) isolated from bacteria and archae (MGEs) 313
Database ICEberg2.0 an updated database of bacterial integrative and conjugative elements 318 319
Database microscope Microbial Genome Annotation & Analysis Platform 329
Database CARD Comprehensive Antibiotic Resistance Database that is used to identify resistance genes (used in seres patent) 335
Database Raes Reference Genomes Reference genomes from HMP project but filtered and assembled by Raes lab as new resource 358
Database proGenomes Currated database by Sunagawa about with genomes and very good functional annotation on bacteria and archea 371, 372
Database PATRIC the Pathosystems Resource Integration Center, provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases 416
Database FARMEDB is a database of DNA and protein sequences derived exclusively from environment sequences showing AR in laboratory experiments. The Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences, predicted protein sequences conferring antibiotic resistance and additional regulatory and mobile genetic elements and predicted proteins flanking the antibiotic resistant genes 442, 443
Database VMH The VMH database captures information on human and gut microbial metabolism and links this information to hundreds of diseases and nutritional data 474
Database MetaNetX Automated Model Construction and Genome Annotation for Large-Scale Metabolic Networks 475
Database Microbiome Database (old Integrated Gene Catalogue) Microbiome database involves the sequencing resource and metadata of ecological community samples of microorganisms, including both host-associated or environmental microbes. This database provides detailed and accurate metadata of these metagenomics samples, as well as gene catalogs for host-associated microbiome, and moreover, well-characterized isolated strains can be found in our database too 490, 491
Database Human Gut metabolic Models Human curated database by Raes lab to link pathway identifiers to metabolic functions which can be used for metagenomic samples to get metabolic functions 510
Database CAZy The Carbohydrate-Active enZYmes Database CAZy database describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds. 18, 19
Database ImmeDB Intestinal microbiome mobile element is a database dedicated to the collection, classification, and annotation of mobile genetic elements (MGEs) from gut microbiome 595, 596
LIMS openBIS open source Laboratory Notebok & Inventory manager 707
Database probeBase probeBase is a curated database of rRNA-targeted oligonucleotide probes and primers 724
Database bugsigdb A Comprehensive Database of Published Microbial Signatures 766, 767
Webapp GMGC Global Microbial Gene Catalog 772, 773
Webapp MAP The Microbe Atlas Project aims to shed new light on the ecology of these elusive microbes by leveraging the large amounts of sequenced microbial communities 821
Database proGenomes2 an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes 851
Database rrn_db contains the sequence database of rrn operons used for read mapping 948, 949

Bioinformatics

Category Name Description Link
Bioinformatic Tools OmicTools Collection of many many tools that can be useful for some bioinformatic anlyses 4
16S pipeline Gloor Lab dada2 pipeline This pipeline will take your paired fastq reads (from Illumina MiSeq or HiSeq) and generate an OTU counts table with an approximate taxonomy assignment. The reads have to have been generated using Gloor Lab Illumina SOP so that the reads are paired, overlapping, and contain the barcode and primer information (have not been demultiplexed or had primers or barcodes removed). 8
Metagenomics SingleM SingleM is a tool to find the abundances of discrete operational taxonomic units (OTUs) directly from shotgun metagenome data, without heavy reliance on reference sequence databases. It is able to differentiate closely related species even if those species are from lineages new to science. 13
Gene annotation Pulpy An automated, reproducible and scalable prediction of Polysaccharide Utilisation Loci (PUL) in 5414 public Bacteroidetes genomes. The predictions are fully open and can be accessed and used by any researcher, commercial or otherwise. 17, 18, 19; preprint 20
16S pipeline mare The mare R package is an easy-to-use pipeline for microbiota analysis based on 16S-amplicon reads. It takes the raw reads, creates taxonomic tables, visualises the results, and finally identifies organisms significantly associated with variables of interest. For read processing, OTU clustering, and taxonomic annotation 32
WGS assembly pipeline pgap The official bacterial whole genome assembly pipeline of NCBI 33, 674
r-package picante Phylocom integration, community analyses, null-models, traits and evolution in R 39
tree-modeling iq-tree Fast and effective stochastic algorithm to reconstruct phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihood while requiring similar amount of computing time 45
modeling PartitionFinder2 PartitionFinder2 is a program for selecting best-fit partitioning schemes and models of evolution for nucleotide, amino acid, and morphology alignments. 47
Function Prediction PICRUST Predicts functions of total genomes based on 16S sequences 49
Function Prediction Tax4Fun Predicts functions of total genomes based on 16S sequences 50
ML-classifier MicroPheno is a reference- and alignment-free approach for predicting the environment or host phenotype from microbial community samples based on k-mer distributions in shallow sub-samples of 16S rRNA data. 54, 55
OTU-generator DiTaxa alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for microbiome phenotype and biomarker detection 56
OTU-geneartor HmmUFOtu An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies 58
OTU-generator otu2ot Oligotyping for R 59
Microbiomics SOP Microbiome_helper Microbiome Helper is a repository that contains several resources to help researchers working with microbial sequencing data 62
16S Pipeline SeekDeep is one command line program that contains several programs within that all combined together make up the SeekDeep targeted sequencing analysis pipeline 67, 68
R Package - ShinyApp FastqCleaner An interactive web application for quality control, filtering and trimming of FASTQ files. 81, 82
Preprocessing tool fastp A tool designed to provide fast all-in-one preprocessing for FastQ files mainly used to correct R1 and R2 reads for better merging 83, 84
Python tool ncbi-genome-download Some script to download bacterial and fungal genomes from NCBI after they restructured their FTP a while ago. 85
Pipeline phyloFlash phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an Illumina (meta)genomic or transcriptomic dataset. 86
Tool DUDE-Seq DUDE-Seq: Fast, flexible, and robust denoising of nucleotide sequences 92, 93
Python tool RAMBL A tool for the assembly of full-length 16S genes in metagenomic shotgun data 100, 101
Classification tool CAMITAX Taxonomic assignment workflow based on multiapproach 105, 106
Docker container speciesprimer The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems 111
tool EnaBrowserTools enaBrowserTools is a set of scripts that interface with the ENA web services to download data from ENA easily, without any knowledge of scripting required 116
Toolkit NCBI Toolkit NCBI C++ Toolkit provides free, portable, public domain libraries with no restrictions use - on Unix, MS Windows, and Mac OS platforms 119
tool FastANI Fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI) 120, 121
toolbox EzBio tools OrthoANI, UBCG and other useful tools for WGS analyses 122
data wrangling Bioinformatics one-liners Useful bash one-liners useful for bioinformatics 133
web-workbench imngs Integrated Microbial NGS platform 143, 144
Pipeline Roary Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. 147
Tool OrthoFinder It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplcation events in those gene trees. 149
Tool CarveMe CarveMe is a python-based tool for genome-scale metabolic model reconstruction. 152, 153, 154
Tool SMETANA Species METabolic interaction ANAlysis is a python-based command line tool to analyse microbial communities 155, 156
Tool FRAMED a python package for analysis and simulation of metabolic models. The main focus is to provide support for different modeling approaches 157 158 162
Tool cobrapy COBRA methods are widely used for genome-scale modeling of metabolic networks in both prokaryotes and eukaryotes. cobrapy is a constraint-based modeling package that is designed to accommodate the biological complexity of the next generation of COBRA models and provides access to commonly used COBRA methods, such as flux balance analysis, flux variability analysis, and gene deletion analyses 159
Tool GPRTransform It contains an implementation of the method that transforms an SBML model by integrating the GPR associations directly into the stoichiometric matrix. This enables gene-based analysis using several constraint-based methods 163 164
Tool eggnog-mapper a tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments 165 166
Pipeline miQTL-cookbook This is the cookbook for performing the GWAS analysis of microbial abundance based on analysis of 16S rRNA sequencing dataset 167
Tool DuctApe The final purpose of the program is to combine the genomic informations (encoded as KEGG pathways) with the results of phenomic experiments (Phenotype Microarrays) and highlight the genes that may be responsible for phenotypic variations 170
Tool VFFVA FVA is the workhorse of metabolic modeling. It allows to characterize the boundaries of the solution space of a metabolic model and delineates the bounds for reaction rates 174 175
Pipeline BACTpipe Automatic Assembly and Annotation from raw reads in a very clean implemented nextflow pipeline 178
Pipeline MAG core Automatic assembly and annotation from raw reads of metagenomic data implemented in nextflow pipeline 179
Pipeline Tychus Nextflow Automatic whole genome assembly and annotation of isolate strain. Uses multiple assemblers and takes consensus 180
Pipeline IMP Reference-independent metagenomic and metatranscriptomic bacterial assembly 182, 183
Tool DESMAN de novo extraction of strains from metagenomes, enables strain inference from frequency counts on contigs across multiple samples 184 185
SOP MicroBiome Quality Control (MBQC) MBQC is a collaborative effort to comprehensively evaluate methods for measuring the human microbiome 187
Pipeline MIDAS an integrated pipeline that leverages >30,000 reference genomes to estimate bacterial species abundance and strain-level genomic variation, including gene content and SNPs, from shotgun metagnomes 196 197
Tool MAGpurify algorithms to identify contamination in metagenome-assembled genomes (MAGs) 198
Tool MicrobeCensus a fast and easy to use pipeline for estimating the average genome size (AGS) of a microbial community from metagenomic data 199
Tool IGGsearch it accurately quantifies species presence-absence and species abundance by mapping reads to a database of species-specific marker genes 200
Tool MIDAS-strains Estimate strains from reads mapped to pan-genomes from the MIDAS database 201
Tool AssemblyEvaluator Evaluate the completedness and precision of a (meta)genomic assembly by mapping contigs to a complete reference genome 202
Tools Biobakery Workflows Set of tools by Huttenhower that can be fairly easily executed with pre-defined workflows, useful for metagenomics and metatranscriptomics 204
Tools Anvi'o Anvi’o is an open-source, community-driven analysis aation platform for ‘omics data 208 209 210 211
Tool WAFFLE the Workflow to Annotate Assemblies and Find Lateral Gene Transfer (LGT) Events 212
Tool AUTOGRAPH AUtomatic Transfer by Orthology of Gene Reaction Associations for Pathway Heuristics, is a semi-automatic approach to accelerate the process of genome-scale metabolic network reconstruction by taking full advantage of already manually curated networks 214
Tool pyTARG a library that contains functions to work with Genome Scale Metabolic Models with the goal of finding drug targets against cancer 223 224
Assembler Unicycler An assembler for short and long read hybrid assembly, works with SPADES and then something else for long reads. 227
R package microclass an R-package for 16S taxonomy classification 231 232
Tool Prodigal Fast, reliable protein-coding gene prediction for prokaryotic genomes 233 234
Tool STAMP a graphical software package that provides statistical hypothesis tests and exploratory plots for analysing taxonomic and functional profiles 235 236
Tool CheckM an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes 237 238
R script consenTRAIT Phylogenetic conservatism of functional traits in microorganisms. a phylogenetic metric that estimates the clade depth where organisms share a trait 239 240
NIH Tools NIH Genome Inforamtics Section Tools for various bioinformatic tasks, assembly, Mash, metagenomes, Krona, MUMmer alignment 242
R package mmgenome Tools for extracting individual genomes from metagneomes 243 244
Tool SPAdes St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines 254,365
Tool SqueezeMeta a fully automated metagenomics pipeline, from reads to bins 261 262
Tool MetaWRAP a flexible pipeline for genome-resolved metagenomic data analysis 263 264
R Package HiMap High-resolution Microbial Analysis Pipeline to Strain level with dada2 and curated HiMapDB 273 274
Research Group van nimwegenlab a range of software tools, web-services, and databases in regulatory and comparative genomics for WGS 275
Tool Rnammer predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences 278
Tool RANGER-DTL Rapid ANalysis of Gene family Evolution using ReconciliationDTL is a software package for inferring gene family evolution by speciation, gene duplication, horizontal gene transfer, and gene loss 279
Tool Darkhorse a bioinformatic method for rapid, automated identification and ranking of phylogenetically atypical proteins on a genome-wide basis 280
Tool ABRicate Mass screening of contigs for antimicrobial resistance or virulence genes. It comes bundled with multiple databases: Resfinder, CARD, ARG-ANNOT, NCBI BARRGD, NCBI, EcOH, PlasmidFinder, Ecoli_VF and VFDB 286 334
Tool MetaCompare MetaCompare is a computational pipeline for prioritizing resistome risk by estimating the potential for ARGs to be disseminated into human pathogens from a given environmental sample based on metagenomic sequencing data 287
Tool DeepARG DeepARG is a machine learning solution that uses deep learning to characterize and annotate antibiotic resistance genes in metagenomes 288
Tool SSTAR Sequence Search Tool for Antimicrobial Resistance combines a locally executed BLASTN search against a customizable database with an intuitive graphical user interface for identifying antimicrobial resistance (AR) genes from genomic data 289 290
Tool ProtCNN ProtENN Predicting the function of a protein from its raw amino acid sequence is the critical step for understanding the relationship between genotype and phenotype 295
Benchmarking Long-read-assembler-comparison Benchmarking of long-read assembly tools for bacterial whole genomes 298
conda bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another 299
Tool bin3C Extract metagenome-assembled genomes (MAGs) from metagenomic data using Hi-C 303 304
Tool MAGpy Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie) 305 306
Tool graftM a tool for scalable, phylogenetically informed classification of genes within metagenomes 307 308
Tool GFinisher a tool for refinement and finalization of prokaryotic genomes assemblies using the bias of GC Skew to identify assembly errors and organizes the contigs/scaffolds with genomes references 311 312
Tool Autometa automated extraction of microbial genomes from individual shotgun metagenomes 314 315
Tool iMGEins detecting novel mobile genetic elements inserted in individual genomes (MGEs) 316 317
Tool McClintock an Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data (MGEs) 320 321
Webtool PHASTER a better, faster version of the PHAST phage search tool 322 323
Tool ISQuest identifies bacterial ISs and their sequence elements—inverted and direct repeats—in raw read data or contigs using flexible search parameters (MGEs) 324 325
Tool VirSorter mining viral signal from microbial genomic data 326 327
Tool RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating complete or nearly complete bacterial and archaeal genomes 329 330
Tool ShortBRED Tool by Huttenhower group that identifies protein families in metagenomic samples. Useful for protein profiling?? 336
Tool & R package GSEA Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes) 337 338
Tool, Database GMMs Omixer Tool with curated database by raes lab that links metagenomic samples to functions and metabolic capabilities 342, 343, 344, 523
Tool GRASP2 fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data 345, 346
Tool Picrust2 a software for predicting functional abundances based only on marker gene sequences 347, 348
Pipeline Antimicrobial Resistance Finder Nextflow pipeline to identify antimicrobial resistances protein sequences, looks simple to use 350
Tool Geptop2 a gene essentiality prediction tool for complete-genome based on orthology and phylogeny 351, 352
Tool Asgan [As]sembly [G]raphs [An]alyzer – is a tool for analysis of assembly graphs 353
Tool PopCOGenT Identifying microbial populations using networks of horizontal gene transfer 355
Tool PhiSpy a novel algorithm for finding prophages in bacterial genomes that combines similarity-and composition-based strategies 356, 357
Tool MetaCurator Software for curating reference sequence databases used in barcoding, metabarcoding and metagenomics 359, 360
Tutorial astrobiomike This site aims to be a useful resource for bioinformatics beginners 361,362
Tool (sour)Mash fast genome and metagenome distance estimation using MinHas 363,364
Tool (meta)pasmidSpades for plasmid assembly in metagenomic data sets that reduced the false positive rate of plasmid detection compared with the state-of-the-art approaches 364,365
Tool IslandViewer4 integrates four different genomic island prediction methods: IslandPick, IslandPath-DIMOB, SIGI-HMM, and Islander 366,367
Tool, Server Specl Web server (but also stand-alone tool) to determine species classification of whole genome based on ~40 universal single copy marker genes. 370
Tool iRep is a method for determining replication rates for bacteria from single time point metagenomics sequencing and draft-quality genomes 374,375
Tool antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes 376,377,378
Tool NeuRiPP a neural network framework designed for classifying peptide sequences as putative precursor peptide sequences for RiPP biosynthetic gene clusters 379,380
Tool PhyloMagnet Pipeline for screening metagenomes, looking for arbitrary lineages, using gene-centric assembly methods and phylogenetics 381,382
Tool KrakenUnique Kraken based tool for classifying metagenomic reads with an additional algorithm that checks for unique Kmer matches - maybe similar to cosmosID approach 383
Tool Mash Tool for classifying metagenomic reads similar to kraken which uses min Hash to identify species 384
Tool RefSeq_mash Tool for checking what NCBI reference genomes raw reads match to or overall which reference genome fits the best, should be very fast. 385
Pipeline Hybrid Assembler Hybrid Assembly pipeline in Nextflow thats coupled with a plasmIDent which identifies plasmids and resistance genes 390, 391
Tool RMI Comprehensive antimicrobial resistance (AMR) gene finder tool online for quick analysis of genome sequences 392
Pipeline SqueezeMeta A full automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis 394, 395
Review Identifying repeats and transposable element Nice nature review that describes various software for finding these things but a bit oldated 395
Tool ARDaP Antimicrobial Resistance Detection and Prediction) is a genomics pipeline for the comprehensive identification of antibiotic resistance markers from whole-genome sequencing data 399
Tool Flye New long read assembler thats faster and often better than others published by USCD 400
Tool Ra Overlap-layout-consensus based DNA assembler of long uncorrected reads (short for Rapid Assembler) 403, 404
Tool Metagenomics-Index-Correction This repository contains scripts used to prepare, compare and analyse metagenomic classifications using custom index databases, either based on default NCBI or GTDB taxonomic systems 405, 406
Tool drep a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set 407
Tool strainProfiler Program to analyze strain-level diversity within a population 408
Tool seqtk Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format 409, 410
Tool anvio to bandage tools converts output from Anvi'o, a MAG binning tool, to the coloring scheme preferred by Bandage, an assembly visual tool, to improve binning especially for mobile genes (tranposons, recently horizontally transferred, etc.) 413
Tool OPERA-MS OPERA-MS is a hybrid metagenomic assembler which combines the advantages of short and long-read 414, 415
Tool traitar Traitar is a software for characterizing microbial samples from nucleotide or protein sequences. It can accurately phenotype 67 diverse traits. 418, 419, 420
Tool PhyloRank PhyloRank provide functionality for calculating the relative evolutionary divergence (RED) of taxa in a tree and for finding the best placement of taxonomic labels in a tree. 421
Tool AnnoTree is a web tool for visualization of genome annotations across large phylogenetic trees. 422, 423, 424
Tool AMRfinderPlus Antibiotic resistance gene finder from NCBI 425, 426, 678
Tool nanotext This library enables the use of embedding vectors generated from a large corpus of protein domains to search for similar genomes, where similar is the cosine similarity between one genome's vector and another's. Think about protein domains as words, genomes as documents, and search as a form of document retrieval based on the notion of topic. 427, 428, 453
Tool biomartr Download genomes from NCBI or other databases by specifying species or group name automatically in R 429
Tool Starmr Tool in bioconda to scan for through plasmidfinder, Resfinder, pointfinder and then produce nice summary files with the results 430
Tool TRF Tandem Repeat Finder and Tandem Repeats Database (TRDB) 432, 433
Tool MIST a tool for rapid in silico generation of molecular data from bacterial genome sequences 434, 435
Tool mummer Visualization of correct aligment between genomes 436, 887, 888, 889
Tool Dot2dot accurate whole-genome tandem repeats discovery 437, 438
Tool miCompletete An "easy" to use tool to quickly assess the completeness and quality of new genome assemblies, kind of like checkM but with some tweaks 439
Tool, Database ARO Antibiotic resistance ontology database and webserver to quickly get phenotype information based on genes IDs 440, 441
Webapp LINbase a database designed for the purpose of accelerating and simplifying the description of Earth's microbial diversity at a precision that includes, but also goes beyond, named species 447, 448
R package RbioRXN facilitate retrieving and processing biochemical reaction data such as Rhea, MetaCyc, KEGG and Unipathway, the package provides the functions to download and parse data, instantiate generic reaction and check mass-balance. The package aims to construct an integrated metabolic network and genome-scale metabolic model 450
Tool Mumame Mutation Mapping in Metagenomes is a software tool that allows mapping of shotgun metagenomic reads to point mutations. Designed for Antibiotic Resistance mutations 451, 452
Tool Cobra Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and biochemically feasible phenotypic states 460, 461, 462, 467
Tool METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable high-throughput metabolic and biogeochemical functional trait profiler based on microbial genomes 463, 464
Tool PhenotypeSeeker Identify phenotype-specific k-mers and predict phenotype using sequenced bacterial strains 465, 466
R-package MetaboAnalystR An R Package for Comprehensive Analysis of Metabolomics Data 468, 472, 473
Shiny-App MetaboShiny a novel R and RShiny based metabolomics data analysis package 469, 470, 471
Tool micom micom is a Python package for metabolic modeling of microbial communities 492, 493, 494
Tool Struo a pipeline for building custom databases for common metagenome profilers 498, 499
Tool ubialSim This is µbialSim (pronounced microbialsim), a dynamic Flux-Balance-Analysis-based simulator for complex microbial communities. Batch and chemostat operation can be simulated 500, 501
Tool ConFindr to find bacterial intra-species contamination in raw Illumina data. It does this by looking for multiple alleles of core, single copy genes. 507, 508, 722
Tool MetaSanity a wrapper-script for genome/metagenome evaluation tasks. This script will run common evaluation and annotation programs and create a BioMetaDB project with the integrated results 509
Tool REAPR From Sanger institute, it maps paired-end reads to de-novo assembly to check for assembly errors and can break up wrong scaffolds 511
Tool Kaiju Metagenomic read classification based on Amino acid sequences. Suggested by Gabi that it works well 512
Tool mOTU2 The mOTUs profiler is a computational tool that estimates relative abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data. 513, 514
Tool fetchMG it extracts the 40 MGs from genomes and metagenomes in an easy and accurate manner. 515
Tool Metage2Metabo is a Python3 (Python >= 3.6) tool to perform graph-based metabolic analysis starting from annotated genomes (reference genomes or metagenome-assembled genomes). It uses Pathway Tools in a automatic and parallel way to reconstruct metabolic networks for a large number of genomes 518, 519
R package AMR simplify the analysis and prediction of Antimicrobial Resistance (AMR) 520, 521, 878
Tool GRASE Genome Relative Abundance to Sequencing Effort (GRASE) 522
Tool FMAP Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies 524, 525, 526
Tool ResPipe A nextflow-pipeline for interrogating metagenomes for Antimicrobial Resistance Genes (CARD-based), Insertion Sequences and Enterobactericeae Plasmids 527, 528
Tool epa-ng A tool to place a sequence among an already calculated tree such as SILVA. Similar to pplacer 535
Tool ngs-less A toolbox for metagenomics analyeses by Peer Bork at Embl. Has MOCAT integrated with mOTUs and functional profiling 536
R package Castor Interesting to calculate relative evolutionary divergence (RED) with get_reds to calculate relative evolutionary divergences in a tree 537, 538
R package themetagenomics themetagenomics provides functions to explore topics generated from 16S rRNA sequencing information on both the abundance and functional levels. It also provides an R implementation of PICRUSt and wraps Tax4fun, giving users a choice for their functional prediction strategy 543, 544
Tool prokka2kegg This script is used to assign KO entries (K numbers in KEGG annotation) according to UniProtKB ID in the .gbk file generated by Prokka 546
Toolset PAGIT From Wellcome Sanger Institute a set of tools to polish draft genomes and correct annotation 547
Tool DFAST a flexible and customizable pipeline for prokaryotic genome annotation as well as data submission to the INSDC 552, 553
Tool DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data 556
Tool Apollo Apollo is an assembly polishing algorithm that attempts to correct the errors in an assembly. It can take multiple set of reads in a single run and polish the assemblies of genomes of any size 563, 564
Tool Minipolish A tool for Racon polishing of miniasm assemblies 566
Tool AMON A command line tool for predicting the compounds produced by microbes and the host 567
Tool Coinfinder A tool for the identification of coincident (associating and dissociating) genes in pangenomes 568, 569, 570
Tool wtdbg2 Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT) 571, 572
Tool freebayes a haplotype-based variant detector 573, 574, 578
Tool qualimap to facilitate the quality control of alignment sequencing data and its derivatives like feature counts; like FastQC for WGS and MAGs 579, 580
Tool picard A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats from broadinstitute 581, 582
Tool Diamond is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data 583, 584
Tool vcftools a set of tools for working with the variant call format (VCF) and binary variant call format (BCF) 585, 586, 587
Tool Gretel An algorithm for recovering haplotypes from metagenomes 589, 590
Tool Hansel Computational haplotype recovery and long-read validation identifies novel isoforms of industrially relevant enzymes from natural microbial communities 591, 592
Tool metabolisHHM a tool for exploration of microbial phylogenies and metabolic pathways 593, 594
Tool ConjScan MacSyFinder-based detection of Conjugative elements using systems modelling and similarity search 597
Tool MacSysFinder A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems 598, 599, 600
Tool LEMON It is a software takes use of existing shotgun NGS datasets to detect HGT breakpoints, identify the transferred genome segments, and reconstructs the inserted local strain 601, 602
Tool MMseqs2 Many-against-Many sequence searching is a software suite to search and cluster huge protein and nucleotide sequence sets 604, 605, 606
Pipeline MicrobiomeBestPracticeReview Current Challenges and Best Practice Protocols for Microbiome Analysis using Amplicon and Metagenomic Sequencing 607, 608
Tool Medaka is a tool to create a consensus sequence using neural networks from nanopore sequencing data 609, 610
Software ARB a graphically oriented package comprising various tools for sequence database handling and data analysis 611
Tool Piphillin a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases 613, 614
Tool BlastFrost a highly efficient method for querying 100,000s of genome assemblies. BlastFrost builds on the recently developed Bifrost, which generates a dynamic data structure for compacted and colored de Bruijn graphs from bacterial genomes 617, 618
Tool BioNode Command line tool for handy NGS data procedures, searching NCBI, downloading SRA stuff or handling fasta files. 622
Tool Biopieces Command line tool for a lot of NGS data procedures, fastq files, mapping, SNPs, etc. but has some dependencies... 623
Tool GrabSeqs Command line tool to download sequence files from SRA, iMicrobes, MG-rast easily 626
Tool fARGene (Fragmented Antibiotic Resistance Gene iENntifiEr ) is a tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output 627, 628
Tool GTDBTk-Script various useful scripts related to GTDB 629
Tool Cello the code is parsed to generate a truth table, and logic synthesis produces a circuit diagram with the genetically available gate types to implement the truth table. The gates in the circuit are assigned using experimentally characterized genetic gates. 633,634,635
Tool URMAP The Ultra-fast Read Mapper (URMAP) is a fast, accurate read mapping with highly compressed output. It is ~10x faster than BWA and Bowtie with comparable accuracy on benchmark tests 636, 637
Tool Artemis The Artemis Software is a set of software tools for genome browsing and annotation 640
Tool EDGAR 2.0 "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" is an enhanced software platform for comparative gene content analyses 641, 642, 643
Tool ASA3P an automatic and scalable assembly, annotation and analysis pipeline for closely related bacterial genomes 644, 645, 646
Tool BIGSdb a software designed to store and analyse sequence data for bacterial isolates 647, 648, 649, 650
Tool OrthoVenn2 is a web platform for comparison and annotation of orthologous gene clusters among multiple species 651, 652
Tool genomeribbon easy to use website to assess a genome assembly with raw reads, long reads and short reads 653
R package FindMyFriends Fast alignment-free pangenome creation and exploration 654, 655
R package dadasnake is a Snakemake workflow to process amplicon sequencing data, from raw fastq-files to taxonomically assigned "OTU" tables, based on the DADA2 method 660, 661
Tool AMRtime Metagenomic AMR detection using hierarchical machine learning models 662
Tool panaroo An updated pipeline for pangenome investigation 663, 664
Pipeline TORMES An automated pipeline for whole bacterial genome analysis of genomes and/or raw Illumina paired-end sequencing data, regardless the number, origin or species 665, 666
Pipeline ASAP3 Automatic Bacterial Isolate Assembly, Annotation and Analyses Pipeline 667, 668
Pipeline nullarbor Pipeline to generate complete public health microbiology reports from sequenced isolates 669
Pipeline Bactopia Bactopia is a flexible pipeline for complete analysis of bacterial genomes 670, 671
Pipeline Common Workflow Language an open standard for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments 673
Metric bacterialEvolutionMetrics Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries 675, 676
Tool NGSpeciesID is a tool for clustering and consensus forming of targeted ONT reads 677, 678
Catalogue long-read-tools A CATALOGUE OF LONG READ SEQUENCING DATA ANALYSIS TOOLS 681
Tool fARGene Fragmented Antibiotic Resistance Gene iENntifiEr 682, 683
Pipeline PathoFac a pipeline for the prediction of virulence factors and antimicrobial resistance genes in metagenomic data 684, 685
Tool MFEprimer a functional primer quality control program for checking non-specific amplicons, dimers, hairpins and other parameters 686, 687, 688
Pipeline STRONG STRONG resolves strains on assembly graphs by resolving variants on core COGs using co-occurrence across multiple samples 689, 690, 691,704
Tool NanoClust De novo clustering and consensus building for ONT 16S sequencing data 694
Tool mVIRs a tool that locates integration sites of inducible prophages in bacterial genomes 697
Tool Metagenome-Atlas a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, to Annotation 698, 699, 700, 701
Tool VIRify a recently developed pipeline for the detection, annotation, and taxonomic classification of viral contigs in metagenomic and metatranscriptomic assemblies 702
Plattform BioContainers is a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity) 705, 706
Tool DeepMAsED deep-learning based evaluating the quality of metagenomic assemblies 708, 709
Tool minMLST a machine-learning based methodology for identifying a minimal subset of genes that preserves high discrimination among bacterial strains 713, 714
Tool hAMRonization CLI parser tools combine the outputs of disparate antimicrobial resistance gene detection tools into a single unified format 715
Tool PPanGGOLiN Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors 717, 718
Webtool OGB OpenGenomeBrowser is a dynamic and scalable web platform for comparative genomics 719, 720
Pipeline Bakta a tool for the rapid & standardized annotation of bacterial genomes & plasmids 721
Tool MentaLiST The MLST pipeline developed by the PathOGiST research group 725, 726
Webapp TyphiNET The TyphiNET dashboard collates antimicrobial resistance (AMR) and genotype (lineage) information extracted from whole genome sequence (WGS) data from the bacterial pathogen Salmonella Typhi, the agent of typhoid fever. 727
Webapp Pathogenwatch provides species and taxonomy prediction for over 60,000 variants of bacteria, viruses, and fungi. MLST prediction is available for over 100 species using schemes from PubMLST, Pasteur, and Enterobase 728
Tool mlst Scan contig files against traditional PubMLST typing schemes 729
Tool snippy Rapid haploid variant calling and core genome alignment 733
Tool MUFFIN a hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis. 734, 735, 736
Tool Pandora a tool for bacterial genome analysis using a pangenome reference graph (PanRG) 738, 739, 740
Tool cgmlst Fork of Torsten Seemanns excellent mlst tool modified for cgMLST 741
Tool Phandango a fully interactive tool to allow visualisation of a phylogenetic tree, associated metadata and genomic information such as recombination blocks, pan-genome contents or GWAS results 741, 742
R package Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions 747, 748
R package Pagoo is an encapsulated, object-oriented class system for analyzing bacterial pangenomes 752, 753, 754, 834
R package simurg Simulate a Bacterial Pangenome in R 754, 755
Nextflow Porefile a Nextflow full-length 16S profiling pipeline for ONT reads 757
Tool MLSTar R package allows you to easily determine the Multi Locus Sequence Type (MLST) of your genomes 758, 759
Tool MOB-suite for clustering, reconstruction and typing of plasmids from draft assemblies 760, 761
Tool PlasForest a random forest classifier of contigs to identify contigs of plasmid origin in contig and scaffold genomes 763, 764
Tool GMGC-mapper Command line tool to query the Global Microbial Gene Catalog (GMGC) 774
Tool MetaGraph Ultra Scalable Framework for DNA Search, Alignment, Assembly of bacterial sequences 775, 776, 777, 778
Tool MIND Microbial Interaction Network Database 786
Pipeline microPIPE a pipeline for high-quality bacterial genome construction using ONT and Illumina sequencing 787
Tool giraffe variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods 795, 796
Tool SquiggleKit A toolkit for accessing and manipulating nanopore signal data 798, 799, 800
Tool FlowerPlot A Python 3.9+ function that makes flower plots for pangenomics 804
Tool Poppunk A tool for clustering genomes. We refer to subclusters of strains as lineages. 807,931,932
Tool PATO a R package designed to analyze pangenomes (set of genomes) intra or inter species 810, 811
Tool PanX is a software package for comprehensive analysis, interactive visualization and dynamic exploration of bacterial pan-genomes 812
Tool 3mcor Metabolome-Microbiome-Metadata-Correlation Analysis 814
Tool GenAPI a program for gene presence absence table generation for series of closely related bacterial genomes from annotated GFF files 829, 830
Tool bammix Summarise nucleotide counts at a set of positions in a BAM file to search for mixtures 835
Tool Wolka (Web of Life Toolkit App), is a bioinformatics package for shotgun metagenome data analysis 836, 837
Tool ECTyper is a standalone versatile serotyping module for Escherichia coli 838, 839
Tool Serotypefinder is a serotyping module for Escherichia coli 840, 841
Tool SRST2 Short Read Sequence Typing for Bacterial Pathogens 842, 843
Tool KEMET a python tool for KEGG Module evaluation and microbial genome annotation expansion (Metabolic) 844, 845
Tool SIAMCAT Statistical Inference of Associations between Microbial Communities And host phenoTypes 846, 847
Collection EMBL Microbiome Analysis Tools Developed at EMBL 848
Tool BacDist Snakemake pipeline for bacterial SNP distance, recombination and phylogenetic analysis 849
Tool PacTyper Snakemake pipeline for continuous clone type prediction for WGS sequenced bacterial isolates based on their core genome 850
Pipeline CulebrONT a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes 857, 858
Tool gapseq Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks 859, 860
Tool MicrobiomeAnalysis This package provides common methods for microbiome analysis 863, also see 852
Tool MiMiC proposes minimal microbial consortia from the functional potential of a given metagenomic sample 864, 865
Tool PIRATE identifies and classifies orthologous gene families in bacterial pangenomes over a wide range of sequence similarity thresholds 867, 868
Tool bacterial_strain_definition Contains the code and workflow for the bacterial strain definition paper with Kostas Kostantinidis 869, 870
Tool CheckM2 Rapid assessment of genome bin quality using machine learning 876
Tool Gubbins Genealogies Unbiased By recomBinations In Nucleotide Sequences 879, 880
Tool SKA a toolkit for prokaryotic DNA sequence analysis (phylogeny) using split kmers 881, 882
Tool Mashtree a rapid comparison of whole genome sequence files 883, 884
Pipeline mGEMS Bacterial sequencing data binning on strain-level based on probabilistic taxonomic classification 885, 886
Tool D-GENIES Dot plot large Genomes in an Interactive, Efficient and Simple way 893, 894, 895
Tool nanotimeparse parses an Oxford Nanopore fastq file on read sequencing start times found in the fastq headers 897
Tool ClonalFrameML package that performs efficient inference of recombination in bacterial genomes 899, 900
Tool minidot Quickly produce pretty dotplots from minimap mappings using R/ggplot2 903
Pipeline microPIPE a pipeline for high-quality bacterial genome construction using ONT and Illumina sequencing 911
Webapp Center for Genomic Epidemiology (CGE) provide access to various bioinformatics resources in clinical epidemiology 917
Tool ggCaller a bacterial gene caller for pangenome graphs 918, 919
Tool LEMMI A Live Evaluation of Computational Methods for Metagenome Investigation, is an online resource and a pipeline dedicated to continuous benchmarking of newly published metagenomics taxonomic classifiers 920
Tool LEMORTHO is an online resource and a pipeline dedicated to continuous benchmarking of newly published methods for orthology delineation 921
Tool KMCP accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping 922, 923
Tool ClermonTyping an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping 924, 925
Tool Circlator A tool to circularize genome assemblies 926, 927, 928
Tool ska2 A toolkit for prokaryotic DNA sequence analysis (phylogeny) using split kmers 929, 930
Pipeline Is a amplicon sequencing pipeline for 16S 933, 934, 935
Tool Pyseer A comprehensive tool for microbial pangenome-wide association studies 941, 942
Tool TBProfiler Can rapidly and accurately predict anti-TB drug resistance profiles across large numbers of samples with WGS data 943, 944, 945
Tool Minion_QC Fast and effective quality control for MinION and PromethION sequencing data 946, 947
Tool GUNC package for detection of chimerism and contamination in prokaryotic genomes 950, 951, 952

Biostatistics

Category Name Description Link
Data-Types Microbiome Datasets Are Compositional: And This Is Not Optional Why OTU tables need to be handled more carefully - They are compositional! 1
Compositional approach CoDa This directory contains the readings, materials, and examples for a workshop originally offered at the Exploring Human Host-Microbiome Interactions in Health and Disease 2016 conference. 6; wiki 7
Compositional approach Frontiers_supplement.Rmd The document is the supplement and companion to the "Microbiome datasets are compositional: and this is not optional." review article. 9
R package CoDaSeq This is the ongoing work to put together a complete suite of functions for CoDa analysis of microbiome, transcriptome and metagenome data 16
Compositional approach PhILR PhILR is short for “Phylogenetic Isometric Log-Ratio Transform” This R-package provides functions for the analysis of compositional data (e.g., data representing proportions of different variables/parts). 25 26
R package PathoStat The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. 28
R package microbiome Tools for microbiome analysis; with multiple example data sets from published studies; extending the phyloseq class. The package is in Bioconductor and aims to provide a comprehensive collection of tools and tutorials, with a particular focus on amplicon sequencing data. 29
R package phyloseq phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data. 30 31
Stat Comparing DA test Package to check various statistical methods to find "spike-ins" in 16S microbiome data 36
R Package Mare Promising easy microbiome analysis - find out what taxa correlate with certain metadata (Not validated yet) 41
R Package PCAexplorer Package to make interactive PCA plots in browser, originally for RNA-seq but maybe adaptable 43
R Package Glimma Interactive visualization of DEseq2 results, might be very helpful in exploration 44
R Package CoDaSeq Compositional Data Analysis Package written by Greg Gloor 53
dimensionality reduction Adaptive gPCA A method for structured dimensionality reduction 61
R Package theseus Add-on for phyloseq 62
R Package decotam implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls 65
Analysis Tutorial Workflow by Holmes Lab A nice tutorial/ workflow for a suggested workflow in microbiome analysis by the Holmes lab 69
R Package phyloseqGraphTest Convinient and easy to use package for graphical testing with phyloseq objects 70
R Package ccrepe Compositionality Corrected by PErmutation and REnormalization (ccrepe) is a package for analysis of sparse compositional data. Specifically, it determines the significance of association between features in a composition, using any similarity measure (e.g. Pearson correlation, Spearman correlation, etc.) 77,78
Network Analysis NetShift To visualize community shufflings in microbial association networks between healthy and diseased states and identify 'driver' nodes observed between the states. 79,80
R Markdown Differential Abundance tests Microbiome Fairly well documented implementations of many different Differential Abundance tools, useful to take some function. 87
Statistics Approach Percentile-normalization method A novel & easy way to deal with batch effects when comparing multiple experiments 88, 89
Software Latent Variable Modeling for the Microbiome probabilistic latent variable models are a cornerstone of modern unsupervised learning, they are rarely applied in the context of microbiome data analysis, in spite of the evolutionary, temporal, and count structure that could be directly incorporated through such models 107 108
Python HAllA Hierarchical All-against-All association testing (HAllA) is computational method to find multi-resolution associations in high-dimensional, heterogeneous datasets 117
tutorial Transformation vs Standardization Data Standardization and Transformation 127
R package BioCor Calculates functional similarities based on the pathways described on KEGG and REACTOME or in gene sets. These similarities can be calculated for pathways or gene sets, genes, or clusters and combined with other similarities 130 131
R package Phylofactor The package phylofactor will help you break apart the phylogeny with a variety of contrasts & objective functions, summarize the splits, and visualize the tree. 137 138 139
R package themetagenomics provides functions to explore topics generated from 16S rRNA sequencing information on both the abundance and functional levels. It also provides an R implementation of PICRUSt and wraps Tax4fun, giving users a choice for their functional prediction strategy. 145 146
R package selbal an R package for selection of balances in microbiome compositional data. It implements a forward-selection method for the identification of two groups of taxa whose relative abundance, or balance, is associated with the response variable of interest 173
R package microPop a dynamic model based on a functional representation of different microbiota 225 226
Article Networks for Microbiota Analysis A nice summary of a lot of network theory and how it is used for microbiota analysis and what the open questions are 255
R package metamicrobiomeR implements Generalized Additive Model for Location, Scale and Shape (GAMLSS) with zero inflated beta (BEZI) family for analysis of microbiome relative abundance data (with various options for data transformation/normalization to address compositional effects) and random effect meta-analysis models for meta-analysis pooling estimates across microbiome studies 282 283
R package metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples 292
R package PIME a package for discovery of novel differences among microbial communities 300 301
R package GLMM-MiRKAT A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies 309 310
R package MIMOSA Model-based Integration of Metabolite Observations and Species Abundances 339, 340
Tool new mmvec (old:rhapsody) Neural networks for estimating microbe-metabolite co-occurence probabilities 354
R package BacArena an open source software for simulating cellular communities. It combines agent-based modeling, flux balance analysis, and statistical analysis 503, 504, 542
Tool BOFdat is a three step workflow that allows modellers to generate a complete biomass objective function de novo from experimental data: Obtain stoichiometric coefficients for major macromolecules and calculate maintenance cost; Find coenzymes and inorganic ions; Find metabolic end goals 505, 506
R package Corncob beta-binomial regression on covariates - might be a nice statistical test on abundance data and variables of interest 531
R package rtsne T-Distributed Stochastic Neighbor Embedding (t-SNE) using a Barnes-Hut Implementation 539, 540, 541
R package microbiomeDASim A toolkit for simulating differential microbiome data designed for longitudinal analyses. Several functional forms may be specified for the mean trend 548
R package MMUPHin an R package for meta-analysis tasks of microbiome cohorts. It has function interfaces for: a) covariate-controlled batch- and cohort effect adjustment, b) meta-analysis differential abundance testing, c) meta-analysis unsupervised discrete structure (clustering) discovery, and d) meta-analysis unsupervised continuous structure discovery 549
R package ReactomeGSA uses Reactome's online analysis service to perform a multi-omics gene set analysis 550
R package LinkHD a general R software to integrate heterogeneous dataset focusing on micribial communities 554, 555
R, Python Rest API Fast Scalable Machine Learning API 576, 577
R package, Webapp Metaboanalyst a user-friendly, web-based analytical pipeline for high-throughput metabolomics studies 618, 619
R package SIAMCAT R package for easy microbiome analysis - confounder analysis - phenotype prediction - Zeller group 620
R package Breakaway R package for r functions for alpha diversity measurements 621
R package seqgroup The seqgroup R package offers a collection of functions that support the analysis of microbial sequencing data with a group structure 631, 632
R package ranomaly R package for statistical analyses and visualization of 16S data 656, 657
R package RioNorm2 A Novel Normalization and Differential Abundance Test Framework for Microbiome Data 658, 659
R package phylosmith A conglomeration of functions that I have written, that I find useful, for analyzing phyloseq objects. Phyloseq objects are a great data-standard for microbiome and gene-expression data 692, 762
R package MicEco Various functions for analysis for microbial community data 693
R package MaAsLin2 A comprehensive R package for efficiently determining multivariable association between phenotypes, environments, exposures, covariates and microbial meta’omic features 730, 731
R package micropml User-Friendly R Package for Supervised Machine Learning Pipelines 749, 750, 751
R package shinyML Compare Supervised Machine Learning Models Using Shiny App 789
R package UMAP Uniform Manifold Approximation and Projection for Dimension Reduction 793, 794
R package MIMOSA2 summarizes paired microbiome-metabolome datasets to support mechanistic interpretation and hypothesis generation 813
R package microViz for analysis and visualization of microbiome sequencing data 825, 826
R package mia implements tools for microbiome analysis based on the SummarizedExperiment 852
R package CARlasso Conditional Auto-Regressive LASSO in R 853, 854
R package microPopGut R package for simulating microbial populations in the human colon 871
Shiny-App shinyMB A web application for sample size and power calculation in case-control microbiome studies 912, 913

Visualization

Category Name Description Link
Analysis tool Calour an Interactive, Microbe-Centric Analysis Tool 102 103 104
R package KEGGgraph graph approach to KEGG PATHWAY in R and Bioconductor 128
R package pathview Pathview is a tool set for pathway based data integration and visualization based on KEGG data 129
R package annotate Annotation for microarrays and GOs 132
Tool SegmentalDuplicationsCircos plots circular genomes 186
Tool Keanu A tool for viewing the contents of metagenomic samples 194 195
R package ampvis2 An R package to visualise amplicon data 245
Python Tool Bokeh Creating interactive low-level visualizations with Python, kind of like ggplotly 246
Tool icarus Icarus is a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on QUAST genome quality assessment tool 247
Tool metaQuast MetaQUAST evaluates and compares metagenome assemblies based on alignments to close references. It is based on QUAST genome quality assessment tool, but addresses features specific for metagenome datasets 248 249
Web-App ITOL Interactive Tree Of Live 259
R package magick The new magick package is an ambitious effort to modernize and simplify high-quality image processing in R 285
App Lucid Align A modern sequence alignment viewer 297
R package HTML Widgets Very nice packages to create more interactive visualizations like plots and tables in HTML Rmd output 302
R package ggpubr an excellent and flexible package for elegant data visualization in R and publication ready figures 396
R package metacoder parsing, plotting, and manipulating large taxonomic data sets 397
Tool Krona Visualization tool to show hirarchical datasets such as metagenomic samples. Used by cosmosID and other services. Created in Excel or dedicated import tools 444
Shiny Webapp PlotTwist a web app for plotting and annotating time series data 445, 446
R package KEGGREST A package that provides a client interface to the KEGGREST server 516
Webapp iPath Interactive Pathways Explorer (iPath) is a web-based tool for the visualization, analysis and customization of various pathway maps. Covers microbial metabolism in diverse environments 533, 534
R package Cowplot The cowplot package provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images 561
R package patchwork The goal of patchwork is to make it ridiculously simple to combine separate ggplots into the same graphic 562
R Package karyoploteR R package to visualize genomic features on genomes - can plot anything that has genomic coordinates - maybe read depth of sequencing too 565
R tutorial kateto Network visualization with R 575
R package rayshader is an open source package for producing 2D and 3D data visualizations in R 638, 639
Webapp biorender a webapp for scientific illustrations with template icons to use 672
App SnapGeneViewer SnapGene Viewer includes the same rich visualization, annotation, and sharing capabilities as the fully enabled SnapGene software 679
R script AnnVis Tutorial to visualize prokka output using gggenes package 680
R package ggseqlogo a versatile R package for drawing sequence logos 695, 696
R Markdown webpage Creating websites in R 716
App TreeViewer Flexible, modular software to visualise and manipulate phylogenetic trees 723
Software Graphia a powerful open source visual analytics application developed to aid the interpretation of large and complex datasets 732
R package ComplexHeatmap provides a highly flexible way to arrange multiple heatmaps and supports self-defined annotation graphics 744, 856
R package circlize circular visualization in R and circular heatmaps 745, 746, 823, 824
R package ggsci Scientific Journal and Sci-Fi Themed Color Palettes for ggplot2 768
R package colorblindr Simulate colorblindness in production-ready R figures 769
R package scico 17 colorblind safe palettes 770, 771
R package plumbertableau Integrating Dynamic R and Python Models in Tableau Using plumbertableau 784, 785
R package Boruta Feature selection with the Boruta algorithm 788
R package camcorder to track and record the ggplots that are created across one or multiple sessions with the eventual goal of creating a gif showing all the plots created sequentially 790
R package ggiraph a tool that allows you to create dynamic ggplot graphs 797
R package ggsvg is an extension to ggplot to use arbitrary SVG as points 817
R package gtsummary provides an elegant and flexible way to create publication-ready analytical and summary tables using the R programming language 819
Webapp Datawrapper lets you show your data as beautiful charts, maps or tables with a few clicks 820
R package mmtable2 Create and combine tables with a ggplot2/patchwork syntax 822
Webapp Lucidchart is the intelligent diagramming application that brings teams together to make better decisions and build the future 833
R Package ampvis2 an R-package to conveniently visualise and analyse 16S rRNA amplicon data in different ways from phyloseq data 831, 832
Webpage From Data to Viz is a classification of chart types based on input data format 855
Cheat Sheet Graphics Principles Cheat Sheet for correct graphics visualization 867
R Package GenoVi generates circular genome representations for complete or draft bacterial and archaeal genomes 872, 873
R Package ggcoverage Visualize and annotate genome coverage with ggplot2 874, 875
R package ggside to enable users to add metadata to their ggplots with ease 877
R package dotplotly Create an interactive or static dot plot from mummer output OR PAF format 890
R package ganttrify nice-looking Gantt charts 901, 902
R package fastbaps The fast BAPS algorithm is based on applying the hierarchical Bayesian clustering (BHC) algorithm to the problem of clustering genetic sequences using the same likelihood as BAPS 906, 907
Software Bandage a program for visualising de novo assembly graphs 908
Webapp lucidchart intelligent diagramming for flow charts 909
Shiny App GraphBio a shiny web app to easily perform popular visualization analysis for omics data 936, 937, 938

Pipeline Managers

Category Name Description Link
R package targets Managing bioinformatics pipelines with R 779, 780, 781
Tutorial bioinformatics-workflows Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers 783

Modelling

Category Name Description Link
Article/Paper Butyrate & Propionate Pathways Paper by Flint group that describes the pathways that lead to butyrate and propionate product and crossfeedbing in anaerobic bacteria 228
Tool curveball Predicting competition results from growth curves 368,369
Database Virtual Metabolic Human Genome wide metabolic models for 800+ different type strains from the human gut ready for extension 449
Tool MICOM Python package using COBRApy for microbial community modelling - sent by lacroix/tomas from recent publication 615 Publication

Other Resources

Category Name Description Link
Stastic Methods Explanations GUSTA ME Wesbite with intuitive explainations of why to use some methods and how to use them 2 Publication: 3
Helpful R Scripts DECIPHER A website where he describes several helpful bioinformatic analyses & how to implement them 5
Primer Design RUCS RUCS - Rapid Identification of PCR Primers Pairs for Unique Core Sequences 14; webapp: 15
DB for Bac.
Genome Annotation The SEED DB curated by experts to annotate the genome features in bacteria. Hopefully useful to quickly scan what pathways our bacteria have or don't have. 331 332 333
DB of Reference Genomes HumanMicrobiomeProject Collection of many bacterial genomes sequenced up to 'draft-quality' and some up to 'gold-standard', probably helpful to analyze gene content of microbiomes and compare with PB Catalog: 21, DataBrowser: 22
Classical Microbiome Pipeline Applied Bioinformatics Book An open-source book on applied bioinformatics - it has a great chapter on classical diversity analysis (UniFrac etc.) Diversity Chapter: 23, Whole Book: 24
PhyloSeq Extension MetagMisc R package to export phyloSeq object easily into dataframes, etc. 27
Download NCBI Genomes ncbi-genome-download Some script to download bacterial and fungal genomes from NCBI after they restructured their FTP a while ago. 34
Phylo Trees Randi Griffin Blog Great blog to show some examples on how to create useful phylo trees and heatmaps etc. 37
R-package biomartr The Biological Sequence Retrieval package allows users to retrieve biological sequences in a very simple and intuitive way. Using biomartr, users can retrieve either genomes, proteomes, CDS, RNA, GFF, and genome assembly statistics data using the specialized functions 38
Rmd Templates rticles A package that includes templates for many journal articles 40
Tutorial DEseq2 for microbiome DEseq2 analysis tutorial with PhyloSeq by Susan Holmes! 42 284 291 293 294
Data MicrobiomeHD Human Microbiome Data from healthy and diseased people by MIT lab - Eric Alm 45
Datasets Google Datasets Search Nice way to search for available datasets 51
Survey Statistics MultiTable Data Analysis for Microbiome Survery of methods in multi table statistics from Holmes Lab 52
Datasets Qiita open-source microbial study management platform. It allows users to keep track of multiple studies with multiple ‘omics data 71,72,73
Workflow Holmes Microbiome Workflow Complete workflow from raw fastq files to fancy multivariate statistics workflow with dada2, DESeq2, etc. with code! 74
R Package ampvis2 useful tool for nice visualization of amplicon data. Easy & nice ordinations! 76
R-Markdown Workshop OPEN & REPRODUCIBLE MICROBIOME DATA ANALYSIS SPRING SCHOOL 2018 96 97
SOPs IMMSA The International Metagenomics and Microbiome Standards Alliance (IMMSA) is a non-hierarchical association of microbiome-focused researchers from industry, academia, and government 123
CNGBdb China National GeneBank DataBase Archive of a lot of chinese sequencing projects with very nice search function 140
Collection nf-Core nextflow pipeline A collection of high quality pipelines for bioinformatic analyses built with nextflow 181
Collection Awesome Nextflow Pipelines A collection of a bunch of bioinformatic pipelines in nextflow: 16S, assembly, etc. 188
Competition, SOP Critical Assessment of Metagenome Assessment Competition where tools are tested on accuracy for strain level binning and assembly (CAMI) 189
Tools Sanger Pathogen Tools A collection of tool made by Sanger institute for pathogen/antimicrobial resistance screening, visualization, assembly, annotation 190
Tool Melonnpan by Biobakery Huttenhower Method to predict metabolites from metagenomic reads, should be pre-trained but can also be tried with standard model 205
Tool ARepA Huttenhower Tool to download information from specific data repositories: gene interaction, functional association 206
Tool PysraDB Python library to quickly and systematically download data from NCBI Sequence Read Archive 207
Journal Article Pangenome & Metagenome Nice article from Meren Lab describing how Anvi'o is used to create pangenomes and analyze core genes vs. assesory genes 213
Tools Chiron Docker images and pipelines for metagenomic processing developed for HMP project workshops, includes Huttenhower software like humann2, strainphlan, qiime2 220
Tool PANDA Quick prediction of GO term annotation from Amino Acid sequence - only online service so far 221
Review What is good genome assembler A nice comparison of several genome assemblers for de-novo assembly, hybrid, short and long reads are all compared 241
Tool NCBI Downloader Command line tool to download genomes from NCBI and specify by all kinds of metadata 256
Collection Microbiome_notes A continually expanding collection of microbiome analysis tools 260
Blog GoogleComputeEngineR Blog with a lot of tutorials related to using R and google cloud instances 296
Tool KOMODO Online tool to predict on what media a bacterial strain will grow. Based on DSMZ databases and gene predictions 328
CheatSheet Stanford Machine learning Cheatsheet Cheat sheet that covers all basics and advanced methods in machine learning - summary of stanford course 341
Blog Genomics Tools List List of tools that are installed on a bioinformatics clusters, could have some interesting tools in there 349
SOPs Microbiome-Standards List of SOPs made by microbiome community aimed at coming up with very good standard SOPs for a wide array of microbiome analysis and data creating 373
Website R Graph Library Very cool website with all kinds of visualizations and how to create them in R - great inspiration 386
Blog Shiny Examples Example dashboards that were built with shiny R. Good for inspiration with source code 387
Tool TrueBac ID Online tool to do whole genome taxonomic identification using ANI and 16S depending whats more accurate 388, 389
Tools Pathogen Informatics Sanger Many tools by Sanger institute for pathogen analysis: Resistance genes, circulizing genomes, rapid pan genome generation 398
Blog Klebsiella assembly and analysis Nice Blog post describing up-to-date genome assembly and annotation and analysis of a virulent bacteria 401
Tool PlasFlow Neural Network for identifying whether contig sequences are from a plasmid or chromosome 402
Blog Comparison of long-read assemblers Comparison by rrwick of newest long read assemblers on how they can assemble bacterial genomes with plasmids 407
Tutorial-Blog Tyler Barnum How to Use Assembly Graphs with Metagenomic Datasets 412
Tutorial Phylogenetic Tree visualization Nice and complete tutorial about visualizaing data on phylogenetic trees in R with ggtree, very nice example figures 417
Tutorial Functional enrichment analysis Anvi'o v5.1: Functional Enrichment Analaysis and Computing ANI 431
Repository Kipoi repository of pre-made deep learning models for genomics 454
Knowledgebase KBase a DOE Systems Biology Knowledgebase, an open-source software and data platform that enables data sharing, integration, and analysis of microbes, plants, and their communities 455, 456, 457, 458, 459
Book Computational Genomics with R fundamentals for data analysis for genomics 502
Blog Rmarkdown help A nice guide to make rmarkdown documents beautiful and nice 517
Tutorial Microbiome Analysis 2018 A nice tutorial website for statistical microbiome analysis from Leo Lathi 529
Tutorial Microbiome Utilities a wrapper tool R package for phyloseq 530
Review Data Science in Microbiome A nice review by Leo Lathi for various tools and methods available for microbiome analysis with references to the specific tools that implement methods 532
Tool, API IPATH python wrapper A nice wrapper in python for the IPATH3 API to computationally create graphs 545
R Package formattable nice package to make nice table in Rmarkdown for nicer formatted output 551
R Packages awesome-r A curated list of awesome R frameworks, libraries and software 588
Book R for Data Science This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it 612
Website Git stuff explained Nice website that easily explains all the git commands for command line 624
Tool Type Strain genome Server DSMZ Web tool by DSMZ to type novel genomes based on their collection of type strains 625
Website Beta diversity distances Nice website that has the math equations for most of the beta diversity distances 630
App Pitch Collaborative presentation software for modern teams 703
Reporting conflr an R package to post R Markdown documents to Confluence, a content collaboration tool by Atlassian 737
Tutorial Galaxy Training Collection of tutorials developed and maintained by the worldwide Galaxy community 765
R package thesisdown package to write thesis in Rmarkdown  782
R package blogdown is an R package that makes blogging for R users as straightforward as possible 801, 802, 803
Webpage postsyoumighthavemissed Search 000's of R & Python articles and packages! 805
Tutorial shell-how Write down a command-line to see how it works 806
Webpage webpage-repository the website of AllanLab academic research group at Leiden University 808
Tutorial Machine Learning Machine Learning for Everyone 809
R package RPushbullet a package to send messages to your devices from R 815, 816
R package portfoliodown makes it painless for data scientists to create a polished professional website so they can host their project portfolios, get great job interviews, and launch their data science careers 818
Scripts blantyreESBL This document contains reproducing analysis code which generates the tables and figures for the manuscript: Dynamics of gut mucosal colonisation with extended spectrum beta-lactamase producing Enterobacterales in Malawi 891, 892
codes nf-modules A repository for hosting Nextflow DSL2 module files containing tool-specific process 896
Tutorial Kaggle Data Science competition 898
Tutorial Perfect-bacterial-genome-tutorial Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing 904, 905
Tutorial Long-read tutorial Workflows and tutorials for LongRead analysis with specific focus on Oxford Nanopore data 910

Website to look up Markdown Syntax [https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet]


Save

About

Useful tools in microbiomics and metagenomics