The financial support for developing the iMAP repository ended in October 2018. The maintainer volunteers to be contributing to this repo as a support to the microbiome research community. The primary focus is to make it highly reproducible and more user-friendly. Thank you for your patience.
The iMAP v1.0 is at the preliminary phase. It currently lacks significant aspects of reproducibility compared to the existing modern bioinformatics workflow management systems. Our future plan is to integrate iMAP with a code that defines rules to enable it to be deployed across multiple platforms without any major modifications.
Teresia M. Buza, Triza Tonui, Francesca Stomeo, Christian Tiambo, Robab Katani, Megan Schilling, Beatus Lyimo, Paul Gwakisa, Isabella M. Cattadori, Joram Buza and Vivek Kapur. iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics (2019) 20:374. Link.
- Profiling of sample metadata
- Pre-processing and quality control of paired-reads
- Sequence processing and classification
- mothur (default)
- Phylotype-based method (works for any dataset size).
- OTU-based method (works best for small dataset).
- Phylogeny-based method (works best for small dataset).
- QIIME2
- Transformation of OTU and taxa results into data structure.
- Diversity and statistical analysis, and visualization.
- Phylogenetic analysis and interactive tree annotation
- Generating web-based progress reports
- and more...
The first step is to gather all the materials needed for implementing the iMAP pipeline as described in Table 1. Most iMAP dependencies are executable and are already placed in the PATH using docker, so users should be able to launch them directly from the command line of the specified container.
Read README2.md: README2 guides the implementation of iMAP directly on a specific platform, including Unix-Linux, Mac OS X, and Windows 10. Please note that this is work-in-progress.
Table 1: List of required materials for running iMAP pipeline
Requirement | Description | Location | Remarks |
---|---|---|---|
Raw data | Demultiplexed reads in FASTQ format (.gz) with primers and barcodes removed | data/raw | fastq.gz |
Sample metadata | File name: samplemetadata.tsv. A tab-separated file linking sample identifiers to the variables | data/metadata | Format: mothur or QIIME2 |
Mapping files | For linking sample IDs to the data files | data/metadata | Mothur-formatted & QIIME2-formatted |
Software (Mostly available via pre-built docker images) | |||
Docker | For creating Docker containers that wrap up iMAP dependencies. | Docker Community Edition (CE) | Link |
Seqkit | For inspecting rawdata format and simple statistics. | docker images: readqctools | Link |
BBduk.sh via BBMap | For trimming poor quality reads and removing phiX contamination | Auto-loaded at preprocessing step | Link |
MultiQC | For summarizing FASTQc output | docker images: readqctools | Link |
Mothur | For sequence processing, taxonomy assignment and preliminary analysis | docker images: mothur:v1.41.3 | Link |
QIIME2 | For sequence processing, taxonomy assignment and preliminary analysis | docker images: qiime2core:v2019.1 | Link |
R | For statistical analysis and visualization | docker image:rpackages:v3.5.2 | Link |
iTOL | For displaying, annotating and managing phylogenetic trees | Onlline | Link |
Reference databases: Any of the following databases can be used. | |||
SILVA NR (mothur) | Mothur-formatted rRNA alignments | data/references | Link |
SILVA NR (QIIME2) | QIIME2-formatted classifiers | data/qiime2 | Link |
SILVA (seed) | Mothur-formatted rRNA alignments | data/references | Link |
SILVA(de-gapped) | mothur-formatted classifiers | data/references | Auto-Generated |
RDP | Mothur-formatted classifiers | data/references | Link |
Greengenes | Mothur-formatted classifiers | data/references | Link |
Greengenes | QIIME2-formatted classifiers | data/qiime2 | Link |
EzBioCloud | Mothur-formatted classifiers | data/references | Link |
Custom classifiesr | Any manually built classifiers. Highly recommended when studying a specific group of known microbes. | data/references | Manually-built |
It is likely that some systems, including Ubuntu, Linux, ... may require users to have administrative right, and in such cases:
- Put
sudo
in front of the command, and enter your password when prompted. - Note that the system is often configured to not ask again for a few minutes allowing you to run several commands in succession.
git clone https://github.com/tmbuza/iMAP.git
# OR
curl -LOk https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip
# OR
wget --no-check-certificate https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip
-
Metadata:
-
Mapping files:
-
Mothur-format: qced.files
-
QIIME2-format: manifest.txt
-
-
Variable files (Mothur-based preliminary analysis).
-
Variable 1: var1.design
-
Variable 2: var2.design
-
The following command copy the required data files located in the iMAP/resources/ and place them in their respective folders, as shown on Table 1 above.
bash iMAP/code/demo_data.bash
Users who want to change the default settings may do so using any text editor. The table below shows the location of default parameters that may be altered.
Parameter to change | File Path | Filename | Default |
---|---|---|---|
Phred score | iMAP/code/preprocessing | 04_get_highscore_reads.bash | trimq=25 |
Min Contig length | iMAP/code/seqprocessing | 01_assemble_paired_reads.batch | minlength=100 |
Max Contig length | iMAP/code/seqprocessing | 01_assemble_paired_reads.batch | maxlength=300 |
Min alignment length | iMAP/code/seqprocessing | 02_align_for_16S_consensus.batch | minlength=100 |
Max alignment length | iMAP/code/seqprocessing | 02_align_for_16S_consensus.batch | maxlength=300 |
Reference | iMAP/code/seqclassification | 01_classify_seqs.batch | silva.seed.ng.fasta |
Taxonomy | iMAP/code/seqclassification | 01_classify_seqs.batch | silva.seed.tax |
Classification cutoff | iMAP/code/seqclassification | 01_classify_seqs.batch | cutoff=80 |
QIIME2 settings | iMAP/code/qiime2 | qiime2.bash | DADA2 QC parameters are set at 0 |
Link: https://docs.docker.com/install/ Register for a Docker ID. Link: https://docs.docker.com/docker-id/
Includes:
- rpackages:v3.5.2 for R version 3.5.2 and several packages.
- readqctools:v1.0.0 for quality control of the reads.
- mothur:v1.41.3 for sequence classification and for generating mothur-based OTU tables.
- qiime2core:v2019.1 for sequence classification and for generating qiime2-based OTU table.
Run the following to install the images. Alternatively, to install individual image use docker pull tmbuza/imagename
.
# All images at once
bash iMAP/code/dockerImages.sh
# Individual image
docker pull tmbuza/imagename
docker images
containerName=report1
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/rpackages:v3.5.2 /bin/bash
bash code/01_metadataProfiling_driver.bash
exit
containerName=readpreprocess
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap tmbuza/readqctools:v1.0.0 /bin/bash
bash code/02_readPreprocess_driver.bash
exit
The HTML files summarizing the Read FastQC reports are stored in the results/multiqc/ folder. Open the files in your favorite browser or try to open it using CLI like:
open results/multiqc/qced/R1/multiqc_report.html
containerName=report2
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/rpackages:v3.5.2 /bin/bash
bash code/progressreport2.bash
exit
- Create a mothur container for sequence processing and classification.
containerName=mothurseqprocessing
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/mothur:v1.41.3 /bin/bash
- Run the sequence processing and classification command which implements the folllowing:
- Download reference alignments
- Default: SILVA seed
- Assemble the forward and reverse reads, screen by length and create representative sequences
- Align representative sequences with reference alignments. Default SILVA seed.
- Denoise to remove poor alignments
- Remove Chimeric sequences.
- Classify the sequences and do post-classification QC.
- Estimates the sequencing error rate.
- Download reference alignments
bash ./code/03_imapClassifySEQ_driver.bash
You may see a lot of WARNINGS, It is safe to ignore them.
The program is set to remove all temporary files after completeing processing the sequences. If no file found you may see an error message that reads: rm: cannot remove '.temp': No such file or directory*
- Phylotype-based method (works for any dataset size).
bash ./code/04_1_phylotype_driver.bash
- OTU-cluster method (works best for small dataset).
bash ./code/04_2_otucluster_driver.bash
- Phylogeny-based method (works best for small dataset).
bash ./code/04_3_phylogeny_driver.bash
containerName=report3
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/rpackages:v3.5.2 /bin/bash
bash code/progressreport3.bash
exit
containerName=datatransformation
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/rpackages:v3.5.2 /bin/bash
bash code/datatransformation.bash
exit
containerName=report4
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/rpackages:v3.5.2 /bin/bash
bash code/progressreport4.bash
exit
Statistical analysis compares the variables, and variables are very specific and unique in different studies. Below are links to most important statistical analyses in microbiome studies:
-
Requires a QIIME2 trained classifer.
-
You can train your own classifier using the q2-feature-classifier.
-
Classifier: Naive Bayes classifiers trained on GreenGenes database with 99% OTUs.
-
Download pretrained classifiers for QIIME2 sequence classification:
- The 515-806 conservative fragments
- iMAP default due to its small size.
- Can be spanned by sequencing 200–300 nt from both ends using Illumina MiSeq.
- Alternative pretrained classifiers are available including SILVA and Full length greengenes (see link on Table 1).
- The 515-806 conservative fragments
Download 515-806 conservative fragments
bash iMAP/code/qiime2/qiime2_gg_classifier_fragments.bash
Download full length greengenes classifier
If using full length greengenes or any other pretrained QIIME2-formatted classifiers you must replace the default settings in the executable file (see details below).
bash iMAP/code/qiime2/qiime2_gg_classifier_fulllength.bash
Below is a location and the file to be altered. Find and replace "gg-13-8-99-515-806-nb-classifier.qza" string with the name of your favorable classifier.
Parameter to change | Filename | Default |
---|---|---|
Classifier | iMAP/code/qiime2/qiime2.bash | gg-13-8-99-515-806-nb-classifier.qza |
containerName=qiime2classification
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/qiime2core:v2019.1 /bin/bash
bash code/qiime2/qiime2.bash
exit
Output path: iMAP/data/qiime2/results/
Use client-side interface: https://view.qiime2.org/ to view the results.
Simply drag and drop the QIIME 2 artifacts (.qza files) or the visualizations (.qzv files).
For more help visit https://view.qiime2.org/about.
The output is a file containing OTUs and taxonomy
containerName=biomconvertmothur
docker run --rm --name=$containerName -it -v $(pwd)/iMAP:/imap --workdir=/imap tmbuza/qiime2core:v2019.1 /bin/bash
bash code/qiime2/convertmothur_biom.bash
exit