iMAP: Integrated Microbiome Analysis Pipeline

The financial support for developing the iMAP repository ended in October 2018. The maintainer volunteers to be contributing to this repo as a support to the microbiome research community. The primary focus is to make it highly reproducible and more user-friendly. Thank you for your patience.

Version: iMAP v1.0 (Pre-Release)

The iMAP v1.0 is at the preliminary phase. It currently lacks significant aspects of reproducibility compared to the existing modern bioinformatics workflow management systems. Our future plan is to integrate iMAP with a code that defines rules to enable it to be deployed across multiple platforms without any major modifications.


Teresia M. Buza, Triza Tonui, Francesca Stomeo, Christian Tiambo, Robab Katani, Megan Schilling, Beatus Lyimo, Paul Gwakisa, Isabella M. Cattadori, Joram Buza and Vivek Kapur. iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics (2019) 20:374. Link.

Supported Analyses

  1. Profiling of sample metadata
  2. Pre-processing and quality control of paired-reads
  3. Sequence processing and classification
  • mothur (default)
    • Phylotype-based method (works for any dataset size).
    • OTU-based method (works best for small dataset).
    • Phylogeny-based method (works best for small dataset).
  • QIIME2
  1. Transformation of OTU and taxa results into data structure.
  2. Diversity and statistical analysis, and visualization.
  3. Phylogenetic analysis and interactive tree annotation
  4. Generating web-based progress reports
  5. and more...

Primary iMAP file folders


The first step is to gather all the materials needed for implementing the iMAP pipeline as described in Table 1. Most iMAP dependencies are executable and are already placed in the PATH using docker, so users should be able to launch them directly from the command line of the specified container.

Non-Docker Image Users

Read README2.md: README2 guides the implementation of iMAP directly on a specific platform, including Unix-Linux, Mac OS X, and Windows 10. Please note that this is work-in-progress.

Table 1: List of required materials for running iMAP pipeline

Requirement Description Location Remarks
Raw data Demultiplexed reads in FASTQ format (.gz) with primers and barcodes removed data/raw fastq.gz
Sample metadata File name: samplemetadata.tsv. A tab-separated file linking sample identifiers to the variables data/metadata Format: mothur or QIIME2
Mapping files For linking sample IDs to the data files data/metadata Mothur-formatted & QIIME2-formatted
Software (Mostly available via pre-built docker images)
Docker For creating Docker containers that wrap up iMAP dependencies. Docker Community Edition (CE) Link
Seqkit For inspecting rawdata format and simple statistics. docker images: readqctools Link
BBduk.sh via BBMap For trimming poor quality reads and removing phiX contamination Auto-loaded at preprocessing step Link
MultiQC For summarizing FASTQc output docker images: readqctools Link
Mothur For sequence processing, taxonomy assignment and preliminary analysis docker images: mothur:v1.41.3 Link
QIIME2 For sequence processing, taxonomy assignment and preliminary analysis docker images: qiime2core:v2019.1 Link
R For statistical analysis and visualization docker image:rpackages:v3.5.2 Link
iTOL For displaying, annotating and managing phylogenetic trees Onlline Link
Reference databases: Any of the following databases can be used.
SILVA NR (mothur) Mothur-formatted rRNA alignments data/references Link
SILVA NR (QIIME2) QIIME2-formatted classifiers data/qiime2 Link
SILVA (seed) Mothur-formatted rRNA alignments data/references Link
SILVA(de-gapped) mothur-formatted classifiers data/references Auto-Generated
RDP Mothur-formatted classifiers data/references Link
Greengenes Mothur-formatted classifiers data/references Link
Greengenes QIIME2-formatted classifiers data/qiime2 Link
EzBioCloud Mothur-formatted classifiers data/references Link
Custom classifiesr Any manually built classifiers. Highly recommended when studying a specific group of known microbes. data/references Manually-built

Getting Started

Running a shell command as root or system administrator

It is likely that some systems, including Ubuntu, Linux, ... may require users to have administrative right, and in such cases:

  • Put sudo in front of the command, and enter your password when prompted.
  • Note that the system is often configured to not ask again for a few minutes allowing you to run several commands in succession.

Download iMAP repository

git clone https://github.com/tmbuza/iMAP.git

# OR

curl -LOk https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip

# OR

wget --no-check-certificate https://github.com/tmbuza/iMAP/archive/master.zip 
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip

Add data to designated folders

File formats

  1. Metadata:

  2. Mapping files:

  3. Variable files (Mothur-based preliminary analysis).

Data for optional testing of iMAP

The following command copy the required data files located in the iMAP/resources/ and place them in their respective folders, as shown on Table 1 above.

bash iMAP/code/demo_data.bash

User Options

Users who want to change the default settings may do so using any text editor. The table below shows the location of default parameters that may be altered.

Parameter to change File Path Filename Default
Phred scoreiMAP/code/preprocessing04_get_highscore_reads.bashtrimq=25
Min Contig lengthiMAP/code/seqprocessing01_assemble_paired_reads.batchminlength=100
Max Contig lengthiMAP/code/seqprocessing01_assemble_paired_reads.batchmaxlength=300
Min alignment lengthiMAP/code/seqprocessing02_align_for_16S_consensus.batchminlength=100
Max alignment lengthiMAP/code/seqprocessing02_align_for_16S_consensus.batchmaxlength=300
Classification cutoffiMAP/code/seqclassification01_classify_seqs.batchcutoff=80
QIIME2 settingsiMAP/code/qiime2qiime2.bashDADA2 QC parameters are set at 0

Set up Docker

Link: https://docs.docker.com/install/ Register for a Docker ID. Link: https://docs.docker.com/docker-id/

Download dependencies images


  1. rpackages:v3.5.2 for R version 3.5.2 and several packages.
  2. readqctools:v1.0.0 for quality control of the reads.
  3. mothur:v1.41.3 for sequence classification and for generating mothur-based OTU tables.
  4. qiime2core:v2019.1 for sequence classification and for generating qiime2-based OTU table.

Run the following to install the images. Alternatively, to install individual image use docker pull tmbuza/imagename.

# All images at once

bash iMAP/code/dockerImages.sh

# Individual image

docker pull tmbuza/imagename

Confirm the installation

docker images

Start the analysis

Metadata profiling

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/rpackages:v3.5.2 /bin/bash

bash code/01_metadataProfiling_driver.bash

Read Preprocessing

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap tmbuza/readqctools:v1.0.0 /bin/bash

bash code/02_readPreprocess_driver.bash


The HTML files summarizing the Read FastQC reports are stored in the results/multiqc/ folder. Open the files in your favorite browser or try to open it using CLI like:

open results/multiqc/qced/R1/multiqc_report.html

Preprocessing progress report

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/rpackages:v3.5.2 /bin/bash

bash code/progressreport2.bash

MOTHUR: Sequence Processing and classification

  1. Create a mothur container for sequence processing and classification.
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/mothur:v1.41.3 /bin/bash
  1. Run the sequence processing and classification command which implements the folllowing:
    • Download reference alignments
    • Assemble the forward and reverse reads, screen by length and create representative sequences
    • Align representative sequences with reference alignments. Default SILVA seed.
    • Denoise to remove poor alignments
    • Remove Chimeric sequences.
    • Classify the sequences and do post-classification QC.
    • Estimates the sequencing error rate.
bash ./code/03_imapClassifySEQ_driver.bash 

You may see a lot of WARNINGS, It is safe to ignore them.

The program is set to remove all temporary files after completeing processing the sequences. If no file found you may see an error message that reads: rm: cannot remove '.temp': No such file or directory*

OTU clustering, Taxonomy assignement and preliminary analysis (Mothur)

  1. Phylotype-based method (works for any dataset size).
bash ./code/04_1_phylotype_driver.bash

  1. OTU-cluster method (works best for small dataset).
bash ./code/04_2_otucluster_driver.bash

  1. Phylogeny-based method (works best for small dataset).
bash ./code/04_3_phylogeny_driver.bash

Sequence processing progress report

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/rpackages:v3.5.2 /bin/bash

bash code/progressreport3.bash

Data Transformation

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/rpackages:v3.5.2 /bin/bash

bash code/datatransformation.bash

OTU analysis progress report

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/rpackages:v3.5.2 /bin/bash

bash code/progressreport4.bash

Statistical analysis

Statistical analysis compares the variables, and variables are very specific and unique in different studies. Below are links to most important statistical analyses in microbiome studies:

QIIME2: Sequence Processing and Classification

  • Requires a QIIME2 trained classifer.

  • You can train your own classifier using the q2-feature-classifier.

  • Classifier: Naive Bayes classifiers trained on GreenGenes database with 99% OTUs.

  • Download pretrained classifiers for QIIME2 sequence classification:

    • The 515-806 conservative fragments
      • iMAP default due to its small size.
      • Can be spanned by sequencing 200–300 nt from both ends using Illumina MiSeq.
    • Alternative pretrained classifiers are available including SILVA and Full length greengenes (see link on Table 1).

Download 515-806 conservative fragments

bash iMAP/code/qiime2/qiime2_gg_classifier_fragments.bash

Download full length greengenes classifier

If using full length greengenes or any other pretrained QIIME2-formatted classifiers you must replace the default settings in the executable file (see details below).

bash iMAP/code/qiime2/qiime2_gg_classifier_fulllength.bash

Below is a location and the file to be altered. Find and replace "gg-13-8-99-515-806-nb-classifier.qza" string with the name of your favorable classifier.

Parameter to change Filename Default

Create QIIME2 container

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/qiime2core:v2019.1 /bin/bash

bash code/qiime2/qiime2.bash

View QIIME 2 results

Output path: iMAP/data/qiime2/results/

Use client-side interface: https://view.qiime2.org/ to view the results.

Simply drag and drop the QIIME 2 artifacts (.qza files) or the visualizations (.qzv files).

For more help visit https://view.qiime2.org/about.

Useful commands

1. Convert mothur biom file within QIIME2

The output is a file containing OTUs and taxonomy

docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap  tmbuza/qiime2core:v2019.1 /bin/bash

bash code/qiime2/convertmothur_biom.bash


