neuroimaging small-sample-size misestimation overestimation machine-lerning major-depressive-disorder clinical-translation mri classification brain

Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression

Authors

Claas Flint ^* 1,2, Micah Cearns ^* 3,5, Nils Opel ¹, Ronny Redlich , David M. A. Mehler ¹, Daniel Emden ¹, Nils R. Winter ¹, Ramona Leenings ¹, Simon B. Eickhoff ^4,8, Tilo Kircher ⁶, Axel Krug ⁶, Igor Nenadic ⁶, Volker Arolt ¹, Scott Clark ³, Bernhard T. Baune ^3,5,7, Xiaoyi Jiang ², Udo Dannlowski ^† 1, Tim Hahn ^† 1

¹Department of Psychiatry, University of Münster, Germany; ²Faculty of Mathematics and Computer Science, University of Münster, Germany; ³Discipline of Psychiatry, School of Medicine, University of Adelaide, Australia; ⁴Institute of Neuroscience and Medicine (INM-7) Research Center Jülich; ⁵Department of Psychiatry, Melbourne Medical School, The University of Melbourne, Parkville, Australia; ⁶Department of Psychiatry and Psychotherapy, University of Marburg, Germany; ⁷The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Parkville, Australia; ⁸Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

^* indicates that the authors contributed equally to the work and should be regarded as first authors. ^† indicates that the authors contributed equally to the work and should be regarded as senior authors.

Abstract

We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of N = 1,868 MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61 %. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95 %. For medium sample sizes (N = 100) accuracies up to 75 % were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.

Keywords: machine learning - neuroimaging - major depressive disorder - misestimation - overestimation - small sample size - clinical translation

About this Repository

This repository contains the LaTeX source code of the arXiv version related to this research paper.

arXiv: https://arxiv.org/abs/1912.06686

The research paper was accepted for publication at Neuropsychopharmacology (Internet - May 2021). Available from: https://doi.org/10.1038/s41386-021-01020-7

For citations please use the data from the following BibTeX entry (download):

@article{FlintCearns2021,
    title = {{Systematic misestimation of machine learning performance in neuroimaging studies of depression}},
    author = {Flint, Claas and Cearns, Micah and Opel, Nils and Redlich, Ronny and Mehler, David M A and Emden, Daniel and Winter, Nils R and Leenings, Ramona and Eickhoff, Simon B and Kircher, Tilo and Krug, Axel and Nenadic, Igor and Arolt, Volker and Clark, Scott and Baune, Bernhard T and Jiang, Xiaoyi and Dannlowski, Udo and Hahn, Tim},
    year = {2021},
    journal = {Neuropsychopharmacology},
    volume = {46},
    month = {jul},
    number = {8},
    pages = {1510--1517},
    doi = {10.1038/s41386-021-01020-7},
    issn = {0893-133X},
    url = {https://doi.org/10.1038/s41386-021-01020-7 http://www.nature.com/articles/s41386-021-01020-7}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

About

LaTex source code for the research paper "Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression" from Flint, Cearns, et al. 2021.

neuroimaging small-sample-size misestimation overestimation machine-lerning major-depressive-disorder clinical-translation mri classification brain

Creative Commons Attribution 4.0 International

Languages

Language:TeX 99.9%Language:Shell 0.1%