satopan / age-prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Human skin, oral, and gut microbiomes predict chronological age

This study performed Random Forest regression analyses of human microbiota from multiple body sites (gut, mouth and skin). This repository provided all source data and codes for generation of all results in the manuscript. Furthermore, in output directories, we also provided additional exploratory analysis results for a better understanding of our microbiota-based models for age prediction.

Data source

Qiita study IDs involved in the meta-analysis:

  • Gut microbiota:
QIITA Study ID EBI accession ID Project name Publication(s) # of samples involved
10317 ERP012803 American Gut Project American Gut: an Open Platform for Citizen Science Microbiome Research 2770
11757 PRJEB18535 GGMP regional variation Regional variation greatly limits application of healthy gut microbiome reference ranges and disease models 1609
  • Oral microbiota:
QIITA Study ID EBI accession ID Project name Publication(s) # of samples involved
10317 ERP012803 American Gut Project American Gut: an Open Platform for Citizen Science Microbiome Research 547
1841 PRJEB5726, PRJEB5727, PRJEB5728 Flores_SMP Temporal variability is a personalized feature of the human microbiome 642
550 ERP021896 Moving pictures of the human microbiome Moving pictures of the human microbiome 508
1774 ERP016472 Puerto Rico and Plantanal NA 48
2010 ERP012216 Longitudinal babies project Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer 72
2024 ERP016621 TZ_probiotic_pregnancy_study Microbiota at Multiple Body Sites during Pregnancy in a Rural Tanzanian Population and Effects of Moringa-Supplemented Probiotic Yogurt 254
2202 PRJEB6518 mit_daily_timeseries Host lifestyle affects human microbiota on daily timescales 285
10052 ERP008799, ERP008694 Yanomani 2008 The microbiome of uncontacted Amerindians 16
11052 ERP021896 Knight_ABTX NA 178
  • Skin microbiota:
QIITA Study ID EBI accession ID Project name Publication(s) # of samples involved
10317 ERP012803 American Gut Project American Gut: an Open Platform for Citizen Science Microbiome Research 440
11052 ERP021896 Knight_ABTX NA 177
2010 ERP012216 Longitudinal babies project Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer 65
1841 PRJEB5726, PRJEB5727, PRJEB5728 Flores_SMP Temporal variability is a personalized feature of the human microbiome 1293

The age distribution of all samples in gut, oral and skin datasets:

age distr Although the skewed age distribution in the skin or oral microbiota dataset may decrease the accuracy of age prediction for the older adults, it will not affect the conclusions about the relative ability of different human microbiomes to predict age.

R scripts

There are some R scripts and files in this repository that were used in the process of preparing the manuscript, also. Here I'll try to explain some of these.

Usage requirements and dependencies

This meta-analysis depends on the self-developed R package crossRanger that can be downloaded as following.

## install.packages('devtools') # if devtools not installed
devtools::install_github('shihuang047/crossRanger')

What analyses were done by the R script Age.crossRF_reg.ranger.R?

The R script Age.crossRF_reg.ranger.R performs the meta-analysis of microbiota data for predicting chronological age. For each dataset (i.e. gut, mouth or skin), this script can perform analyses as following.

  • Data trimming (such as sample filtering by NA values in the metadata).
  • RF modeling and performance evaluation for the whole dataset.
  • RF modeling and performance evaluation for the sub-datasets. To test if confounders (such as sex) affected the modeling, we first trained the age model within a sub-dataset stratified by a confounder, then applied it on all the other sub-datasets. For both model training and testing, we evaluated regression performance using mean absolute error (MAE).
  • Cross-application of RF models built on the sub-datasets and evaluated the performance using MAE.

All the anaylses can be conducted with this script typically in the Rstudio or R concole.

What inputs are neccessary for this R script?

Input gut_data oral_data skin_data Description
datafile gut_data/gut_4434.biom oral_data/oral_4014.biom skin_data/skin_4168.biom Biom-table file
sample_metadata gut_data/gut_4434_map.txt oral_data/oral_2550_map.txt skin_data/skin_1975_map.txt Metadata file
feature_metadata gut_data/gut_taxonomy.txt oral_data/oral_taxonomy.txt skin_data/skin_taxonomy.txt Feature metadata file
prefix_name gut_4434 oral_2550 skin_1975 The prefix of datasets
s_category c("cohort", "sex") "qiita_host_sex" c("body_site","qiita_host_sex") The metadata category for dividing datasets
c_category "age" "qiita_host_age" "qiita_host_age" The targeted metadata category for RF modeling

About the Input/ folder

This folder includes all the input files (biom table, sample metadata and feature metadata files) necessary for the RF regression analysis.

About the Output/ folder

This folder contains all of the output files from the main R script Age.crossRF_reg.ranger.R.

About the Figures/ folder

This folder contains selected output figures from the Output folder to genenrate the formal figures in our manuscript.

Acknowledgements

This work is supported by IBM Research AI through the AI Horizons Network. For more information visit the IBM AI Horizons Network website.

About


Languages

Language:Jupyter Notebook 97.0%Language:PostScript 2.3%Language:R 0.6%