Machine Learning Methods for Precision Medicine Research Designed to Reduce Health Disparities: A Structured Tutorial
Precision medicine research designed to reduce health disparities often involves studying multi-level datasets to understand how diseases manifest disproportionately in one group over another, and how scarce healthcare resources can be directed precisely to those most at risk for disease. Here, I provide a structured tutorial for medical and public health researchers on the application of machine learning methods to conduct precision medicine research designed to reduce health disparities. I review key terms and concepts for understanding machine learning papers, including supervised and unsupervised learning, regularization, cross-validation, bagging, and boosting. I review metrics for evaluating machine learners, and describe major families of learning approaches including tree-based learning, deep learning, and ensemble learning. I highlight the advantages and disadvantages of different learning approaches, describe strategies for interpreting “black box” models, and demonstrate the application of common methods in an example dataset with open-source statistical code in R.
Sanjay Basu, MD, PhD1,2,3; James H. Faghmous, PhD4; Patrick Doupe, PhD5
- Research and Analytics, Collective Health, San Francisco, CA
- Center for Primary Care, Harvard Medical School, Boston, MA
- School of Public Health, Imperial College London, London, UK
- Los Angeles, CA
- Zalando SE, Berlin, Germany