In this module, I will walk you through the necessary steps involved in the analysis of 16S rRNA microbiota amplicons data from raw sequences to publication-quality visualizations and statistical analysis. Non-cultured 16S rRNA metagenomics is a promising method for understanding the ecology of an environment in regards with the number and the structure of the microbiome in association with the environmental factors, e.g. host-microbiome interactions. In prokaryotes there is a ubiquitous gene compartment integrated in the ribosome, so-called 16S rRNA genes, which are highly conserved among prokaryotes and at the same time having hypervariable regions (HVRs) V1 to V9, which are good targets for evolutionary and ecological studies on prokaryotes Jünemann et. al (2017). This module is mainly focused on 16S rRNA gene data, but I can carefully say that you can apply most of the techniques explained here to genome data and count multivariate datasets. Note: all this workflow has been done on Jupyter notebook on a cluster node with 120 GB processer from Aarhus University, Denmark. In order to multitask in different nodes, tasks on Qiime2 have been summited to the cluster by separate bash scripts.
This module includes the following steps:
2. Filtering, dereplication, sample inference, chimera identification, and merging of paired-end reads by DADA2 package in qiime2
3. Training a primer-based region-specific classifier for taxonomic classification by Naïve-Bayes method (in Qiime2)
6. Statistical analysis on beta diversity metrics: a distance-based redundancy analysis (dbRDA) model
Please cite the workflow if you have used it for your publications. You can use this link for Citation.