MarkusKonk / awesome-reproducible-research

A curated list of reproducible research case studies, projects, tutorials, and media

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Reproducible Research Awesome DOI

A curated list of reproducible research case studies, projects, tutorials, and media

Contents

Case studies

The term "case studies" is used here in a general sense to describe any study of reproducibility. A reproduction is an attempt to arrive at comparable results with identical data using computational methods described in a paper. A refactor involves refactoring existing code into frameworks and other reproducibility best practices while preserving the original data. A replication involves generating new data and applying existing methods to achieve comparable results. A robustness test applies various protocols, workflows, statistical models or parameters to a given data set to study their effect on results, either as a follow-up to an existing study or as a "bake-off". A census is a high-level tabulation conducted by a third party. A survey is a questionnaire sent to practitioners. A case narrative is an in-depth first-person account. An independent discussion utilizes a secondary independent author to interpret the results of a study as a means to improve inferential reproducibility.

Study

Field

Approach

Size

Glasziou et al 2008

Medicine

Census

80 studies

Baggerly & Coombes 2009

Cancer biology

Refactor

8 studies

Hothorn et al. 2009

Biostatistics

Census

56 studies

Ioannidis et al 2009

Genetics

Reproduction

18 studies

Anda et al 2009

Software engineering

Replication

4 companies

Vandewalle et al 2009

Signal processing

Census

134 papers

Prinz 2011

Biomedical sciences

Survey

23 PIs

Horthorn & Leisch 2011

Bioinformatics

Census

100 studies

Begley & Ellis 2012

Cancer biology

Replication

53 studies

Collberg et al 2014
Collberg & Proebsting 2016

Computer science

Census

613 papers

OSC 2015

Psychology

Replication

100 studies

Bandrowski et al 2015

Biomedical sciences

Census

100 papers

Patel et al 2015

Epidemiology

Robustness test

417 variables

Chang et al 2015

Economics

Reproduction

67 papers

Iqbal et al 2016

Biomedical sciences

Census

441 papers

Baker 2016

Science

Survey

1,576 researchers

Névéol et al 2016

NLP

Replication

3 studies

Reproducibility Project 2017

Cancer biology

Replication

9 studies

Vasilevsky et al 2017

Biomedical sciences

Census

318 journals

Kitzes et al 2017

Science

Case narrative

31 PIs

Barone et al 2017

Biological sciences

Survey

704 PIs

Kim & Dumas 2017

Bioinformatics

Refactor

1 study

Camerer 2017

Economics

Replication

18 studies

Olorisade 2017

Machine learning

Census

30 studies

Strupler & Wilkinson 2017

Archaeology

Case narrative

1 survey

Danchev et al 2017

Comparative toxicogenomics

Census

51,292 claims in 3,363 papers

Kjensmo & Gundersen 2018

Artificial intelligence

Census

400 papers

Gertler et al 2018

Economics

Census

203 papers

Stodden et al 2018

Computational science

Reproduction

204 articles, 180 authors

Madduri et al 2018

Genomics

Case narrative

1 study

Camerer et al 2018

Social sciences

Replication

21 papers

Silberzahn et al 2018

Psychology

Robustness test

One data set, 29 analyst teams

Boulesteix et al 2018

Medicine and health sciences

Census

30 papers

Eaton et al 2018

Microbiome immuno oncology

Replication

1 paper

Vaquero-Garcia et al 2018

Bioinformatics

Refactor and test of robustness

1 paper

Wallach et al 2018

Biomedical Sciences

Census

149 papers

Miller et al 2018

Bioinformatics

Synthetic replication & refactor

1 paper

Konkol et al 2018

Geosciences

Survey, Reproduction

146 scientists, 41 papers

Rahtz 2018

Reinforcement Learning

Reproduction, case narrative

1 paper

Stodden et al 2018

Computational physics

Census

306 papers

AlNoamany & Borghi 2018

Science & Engineering

Survey

215 participants

Li et al 2018

Nephrology

Robustness test

1 paper

Chen 2018

Social sciences & other

Census

810 Dataverse studies

Nüst et al 2018

GIScience/Geoinformatics

Census, Survey

32 papers, 22 participants

Stagge et al 2019

Geosciences

Survey

360 papers

Bizzego et al 2019

Deep learning

Robustness test

1 analysis

Madduri et al 2019

Genomics

Case narrative

1 analysis

Mammoliti et al 2019

Pharmacogenomics

Case narrative

2 analyses

Allen & Mehler 2019

Biomedical sciences and Psychology

Census

127 registered reports

Pimentel et al 2019

All

Census

1,159,166 Jupyter notebooks

Fergusson et al 2019

Virology

Census

236 papers

Vlisides et al 2019
Sieber et al 2019

Anaesthesia

Indepedent discussion

1 study

Bakker et al 2019

Psychology

Replication

1 paper

Niepel et al 2019

Cell pharmacology

Robustness test

5 labs

Dacrema et al 2019

Machine learning

Reproduction

18 conference papers

Eran et al 2019

Experimental archaeology

Replication

1 theory

Rauh et al 2019

Neurology

Census

202 papers

Sætrevik & Sjåstad 2019

Psychology

Replication

2 experiments

Feng et al. 2019

Ecology and Evolution

Census

163 papers

Botvinik-Nezer et al. 2019

Neuroimaging

Robustness test

1 data set, 70 teams

Klein et al. 2019

Psychology

Replication

1 experiment, 21 labs, 2,220 participants

Obels et al. 2019

Psychology

Census

62 papers

Wayant et al 2019

Oncology

Census

154 meta-analyses

Simoneau et al. 2020

Bioinformatics

Robustness test

1 data set

Miyakawa 2020

Neurobiology

Census

41 papers

Thelwall et al 2020

Genetics

Census

1799 papers

Maassen et al 2020

Psychology

Reproduction

33 meta-analyses

Ad-hoc reproductions

These are one-off unpublished attempts to reproduce individual studies

Reproduction

Original study

https://rdoodles.rbind.io/2019/06/reanalyzing-data-from-human-gut-microbiota-from-autism-spectrum-disorder-promote-behavioral-symptoms-in-mice/ and https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/

Sharon, G. et al. Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177 (6), 1600–1618.e17.

https://github.com/sean-harrison-bristol/CCR5_replication

Wei, X.; Nielsen, R. CCR5-∆32 Is Deleterious in the Homozygous State in Humans. Nat. Med. 2019 DOI: 10.1038/s41591-019-0459-6. (retracted)

Theory papers

Authors/Date

Title

Field

Type

Ioannidis 2005

Why most published research findings are false

Science

Statistical reproducibility

Noble 2005

A Quick Guide to Organizing Computational Biology Projects

Bioinformatics

Best practices

Sandve et al 2013

Ten Simple Rules for Reproducible Computational Research

Computational science

Best practices

Yarkoni 2019

The Generalizability Crisis

Psychology

Statistical reproducibility

Bouthillier et al 2019

Unreproducible Research is Reproducible

Machine Learning

Methodology

Milton & Possolo 2019

Trustworthy data underpin reproducible research

Physics

Scientific philosophy

Devezer et al 2019

Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity

Science

Statistical reproducibility

Tierney et al 2020

A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility

Science

Best practices

Haibe-Kains et al 2020

The importance of transparency and reproducibility in artificial intelligence research

Artificial Intelligence

Critique

Nosek & Errington 2020

What is replication?

Science

Scientific philosophy

Hejblum et al 2020

Realistic and Robust Reproducible Research for Biostatistics

Biostatistics

Best practices

Schriml et al 2020

COVID-19 pandemic reveals the peril of ignoring metadata standards

Virology

Critique

Tool reviews

Authors/Date

Title

Tools

Isdahl & Gundersen 2019

Out-of-the-box Reproducibility: A Survey of Machine Learning Platforms

MLflow, Polyaxon, StudioML, Kubeflow, CometML, Sagemaker, GCPML, AzureML, Floydhub, BEAT, Codalab, Kaggle

Pimentel et al 2019

A Survey on Collecting, Managing, and Analyzing Provenance from Scripts

Astro-Wise, CPL, CXXR, Datatrack, ES3, ESSW, IncPy, Lancet, Magni, noWorkflow, Provenance Curios, pypet, RDataTracker, Sacred, SisGExp, SPADE, StarFlow, Sumatra, Variolite, VCR, versuchung, WISE, YesWorkflow

Leipzig et al 2019 (supplemental)

The Role of Metadata in Reproducible Computational Research

CellML, CIF2, DATS, DICOM, EML, FAANG, GBIF, GO, ISO/TC 276, MIAME, NetCDF, OGC, ThermoML, CRAN, Conda, pip setup.cfg, EDAM, CodeMeta, Biotoolsxsd, DOAP, ontosoft, SWO, OBCS, STATO, SDMX, DDI, MEX, MLSchema, MLFlow, Rmd, CWL, CWLProv, RO-Crate, RO, WICUS, OPM, PROV-O, ReproZip, ProvOne, WES, BagIt, BCO, ERC, BEL, DC, JATS, ONIX, MeSH, LCSH, MP, Open PHACTS, SWAN, SPAR, PWO, PAV, Manubot, ReScience, PandocScholar

Konkol, Markus, Nüst, Daniel, Goulier, Laura 2020

Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication

Authorea, Binder, CodeOcean, eLife RDS, Galaxy Project, Gigantum, Manuscript, o2r, REANA, ReproZip, Whole tale

Courses

Development Resources

User tools

  • Open With Binder for Chrome or Firefox - open the GitHub repository you are visiting using MyBinder.org
  • DVC - DVC tracks machine learning models and data sets

Books

Data Repositories

All these repositories assign Digital Object Identifiers (DOIs) to data

  • DataCite - 12M+ DOIs registered for 46 allocators. Offers APIs and a metadata schema.
  • Data Dryad - curated, metadata-centric, focused on articles associated with published artices, $120 submission fee (various waivers available)
  • Figshare - 20 GB of free private space, unlimited public space, >2M articles, >5k projects
  • OSF - Project-oriented system with access control and integration with popular tools. Unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.
  • Zenodo - Allows embargoed, restricted access, metadata support. 50GB limit.

Examples and Exemplars

  • Jupyter Gallery - Gallery of interesting Jupyter notebooks
  • Papers With Code - ML papers with code
  • NARPS - Code related to Neuroimaging Analysis Replication and Prediction Study
  • Codeocean - A gallery of cloud-based containers with reproducible analyses

Haibe-Kains lab reproducible papers

Publication CodeOcean link
Mer AS et al. Integrative Pharmacogenomics Analysis of Patient Derived Xenografts codeocean.com/capsule/056639
Gendoo, Zon et al. MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature codeocean.com/capsule/643863
Yao et al. Tissue specificity of in vitro drug sensitivity codeocean.com/capsule/550275
Safikhani Z et al. Gene isoforms as expression-based biomarkers predictive of drug response in vitro codeocean.com/capsule/000290
El-Hachem et al. Integrative cancer pharmacogenomics to infer large-scale drug taxonomy codeocean.com/capsule/425224
Safikhani Z et al. Revisiting inconsistency in large pharmacogenomic studies codeocean.com/capsule/627606
Sandhu V et al. Meta-analysis of 1,200 transcriptomic profiles identifies a prognostic model for pancreatic ductal adenocarcinoma codeocean.com/capsule/269362

Journals

  • ReScience - Journal dedicated to insilico reproductions and tests of robustness, lives on Github.
  • ReplicationWiki - Replication in the social sciences, particularly economics

Ontologies

Organizations

  • ResearchObject.org - RO specifications and publications
  • BioCompute - BCO specs
  • rOpenSci - Tools, conferences, and education
  • Open Science Framework - Open source project management
  • pyOpenSci - Promotes open and reproducible research through peer-review of scientific Python packages
  • Replication Network - Furthering the practice of replication in economics. Econ replication database.
  • repliCATS project - Estimating the replicability of research in the social sciences
  • ReproHack - 1-day reproducibility hackathons held worldwide
  • CODECHECK - community for checking executability of scientific preprints and papers

Awesome Lists

Contribute

Contributions welcome! Read the contribution guidelines first. You may find my src/doi2md.py script useful for quickly generating entries from a DOI.

License

CC0

To the extent possible under law, Jeremy Leipzig has waived all copyright and related or neighboring rights to this work.

About

A curated list of reproducible research case studies, projects, tutorials, and media


Languages

Language:Python 100.0%