Human skin, oral, and gut microbiomes predict chronological age
This study performed Random Forest regression analyses of human microbiota from multiple body sites (gut, mouth and skin). This repository provided all source data and codes for generation of all results in the manuscript. Furthermore, in output directories, we also provided additional exploratory analysis results for a better understanding of our microbiota-based models for age prediction.
Data source
Qiita study IDs involved in the meta-analysis:
- Gut microbiota:
QIITA Study ID | EBI accession ID | Project name | Publication(s) | # of samples involved |
---|---|---|---|---|
10317 | ERP012803 | American Gut Project | American Gut: an Open Platform for Citizen Science Microbiome Research | 2770 |
11757 | PRJEB18535 | GGMP regional variation | Regional variation greatly limits application of healthy gut microbiome reference ranges and disease models | 1609 |
- Oral microbiota:
- Skin microbiota:
QIITA Study ID | EBI accession ID | Project name | Publication(s) | # of samples involved |
---|---|---|---|---|
10317 | ERP012803 | American Gut Project | American Gut: an Open Platform for Citizen Science Microbiome Research | 440 |
11052 | ERP021896 | Knight_ABTX | NA | 177 |
2010 | ERP012216 | Longitudinal babies project | Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer | 65 |
1841 | PRJEB5726, PRJEB5727, PRJEB5728 | Flores_SMP | Temporal variability is a personalized feature of the human microbiome | 1293 |
The age distribution of all samples in gut, oral and skin datasets:
Although the skewed age distribution in the skin or oral microbiota dataset may decrease the accuracy of age prediction for the older adults, it will not affect the conclusions about the relative ability of different human microbiomes to predict age.
R scripts
There are some R scripts and files in this repository that were used in the process of preparing the manuscript, also. Here I'll try to explain some of these.
Usage requirements and dependencies
This meta-analysis depends on the self-developed R package crossRanger
that can be downloaded as following.
## install.packages('devtools') # if devtools not installed
devtools::install_github('shihuang047/crossRanger')
Age.crossRF_reg.ranger.R
?
What analyses were done by the R script The R script Age.crossRF_reg.ranger.R
performs the meta-analysis of microbiota data for predicting chronological age. For each dataset (i.e. gut, mouth or skin), this script can perform analyses as following.
- Data trimming (such as sample filtering by NA values in the metadata).
- RF modeling and performance evaluation for the whole dataset.
- RF modeling and performance evaluation for the sub-datasets. To test if confounders (such as sex) affected the modeling, we first trained the age model within a sub-dataset stratified by a confounder, then applied it on all the other sub-datasets. For both model training and testing, we evaluated regression performance using mean absolute error (MAE).
- Cross-application of RF models built on the sub-datasets and evaluated the performance using MAE.
All the anaylses can be conducted with this script typically in the Rstudio or R concole.
What inputs are neccessary for this R script?
Input | gut_data | oral_data | skin_data | Description |
---|---|---|---|---|
datafile |
gut_data/gut_4434.biom | oral_data/oral_4014.biom | skin_data/skin_4168.biom | Biom-table file |
sample_metadata |
gut_data/gut_4434_map.txt | oral_data/oral_2550_map.txt | skin_data/skin_1975_map.txt | Metadata file |
feature_metadata |
gut_data/gut_taxonomy.txt | oral_data/oral_taxonomy.txt | skin_data/skin_taxonomy.txt | Feature metadata file |
prefix_name |
gut_4434 | oral_2550 | skin_1975 | The prefix of datasets |
s_category |
c("cohort", "sex") | "qiita_host_sex" | c("body_site","qiita_host_sex") | The metadata category for dividing datasets |
c_category |
"age" | "qiita_host_age" | "qiita_host_age" | The targeted metadata category for RF modeling |
Input/
folder
About the This folder includes all the input files (biom table, sample metadata and feature metadata files) necessary for the RF regression analysis.
Output/
folder
About the This folder contains all of the output files from the main R script Age.crossRF_reg.ranger.R
.
Figures/
folder
About the This folder contains selected output figures from the Output
folder to genenrate the formal figures in our manuscript.
Acknowledgements
This work is supported by IBM Research AI through the AI Horizons Network. For more information visit the IBM AI Horizons Network website.