A curated list of reproducible research case studies, projects, tutorials, and media
- Case studies
- Ad-hoc reproductions
- Theory papers
- Tool reviews
- Courses
- Development Resources
- User tools
- Books
- Data Repositories
- Examples and exemplars
- Journals
- Ontologies
- Organizations
- Awesome Lists
The term "case studies" is used here in a general sense to describe any study of reproducibility. A reproduction is an attempt to arrive at comparable results with identical data using computational methods described in a paper. A refactor involves refactoring existing code into frameworks and other reproducibility best practices while preserving the original data. A replication involves generating new data and applying existing methods to achieve comparable results. A robustness test applies various protocols, workflows, statistical models or parameters to a given data set to study their effect on results, either as a follow-up to an existing study or as a "bake-off". A census is a high-level tabulation conducted by a third party. A survey is a questionnaire sent to practitioners. A case narrative is an in-depth first-person account. An independent discussion utilizes a secondary independent author to interpret the results of a study as a means to improve inferential reproducibility.
Study |
Field |
Approach |
Size |
Medicine |
Census |
80 studies |
|
Cancer biology |
Refactor |
8 studies |
|
Biostatistics |
Census |
56 studies |
|
Genetics |
Reproduction |
18 studies |
|
Software engineering |
Replication |
4 companies |
|
Signal processing |
Census |
134 papers |
|
Biomedical sciences |
Survey |
23 PIs |
|
Bioinformatics |
Census |
100 studies |
|
Cancer biology |
Replication |
53 studies |
|
Computer science |
Census |
613 papers |
|
Psychology |
Replication |
100 studies |
|
Biomedical sciences |
Census |
100 papers |
|
Epidemiology |
Robustness test |
417 variables |
|
Economics |
Reproduction |
67 papers |
|
Biomedical sciences |
Census |
441 papers |
|
Science |
Survey |
1,576 researchers |
|
NLP |
Replication |
3 studies |
|
Cancer biology |
Replication |
9 studies |
|
Biomedical sciences |
Census |
318 journals |
|
Science |
Case narrative |
31 PIs |
|
Biological sciences
|
Survey |
704 PIs |
|
Bioinformatics |
Refactor |
1 study |
|
Economics |
Replication |
18 studies |
|
Machine learning |
Census |
30 studies |
|
Archaeology |
Case narrative |
1 survey |
|
Comparative toxicogenomics |
Census |
51,292 claims in 3,363 papers |
|
Artificial intelligence |
Census |
400 papers |
|
Economics |
Census |
203 papers |
|
Computational science |
Reproduction |
204 articles, 180 authors |
|
Genomics |
Case narrative |
1 study |
|
Social sciences |
Replication |
21 papers |
|
Psychology |
Robustness test |
One data set, 29 analyst teams |
|
Medicine and health sciences |
Census |
30 papers |
|
Microbiome immuno oncology |
Replication |
1 paper |
|
Bioinformatics |
Refactor and test of robustness |
1 paper |
|
Biomedical Sciences |
Census |
149 papers |
|
Bioinformatics |
Synthetic replication & refactor |
1 paper |
|
Geosciences |
Survey, Reproduction |
146 scientists, 41 papers |
|
Reinforcement Learning |
Reproduction, case narrative |
1 paper |
|
Computational physics |
Census |
306 papers |
|
Science & Engineering |
Survey |
215 participants |
|
Nephrology |
Robustness test |
1 paper |
|
Social sciences & other |
Census |
810 Dataverse studies |
|
GIScience/Geoinformatics |
Census, Survey |
32 papers, 22 participants |
|
Geosciences |
Survey |
360 papers |
|
Deep learning |
Robustness test |
1 analysis |
|
Genomics |
Case narrative |
1 analysis |
|
Pharmacogenomics |
Case narrative |
2 analyses |
|
Biomedical sciences and Psychology |
Census |
127 registered reports |
|
All |
Census |
1,159,166 Jupyter notebooks |
|
Virology |
Census |
236 papers |
|
Anaesthesia |
Indepedent discussion |
1 study |
|
Psychology |
Replication |
1 paper |
|
Cell pharmacology |
Robustness test |
5 labs |
|
Machine learning |
Reproduction |
18 conference papers |
|
Experimental archaeology |
Replication |
1 theory |
|
Neurology |
Census |
202 papers |
|
Psychology |
Replication |
2 experiments |
|
Ecology and Evolution |
Census |
163 papers |
|
Neuroimaging |
Robustness test |
1 data set, 70 teams |
|
Psychology |
Replication |
1 experiment, 21 labs, 2,220 participants |
|
Psychology |
Census |
62 papers |
|
Oncology |
Census |
154 meta-analyses |
|
Bioinformatics |
Robustness test |
1 data set |
|
Neurobiology |
Census |
41 papers |
|
Genetics |
Census |
1799 papers |
|
Psychology |
Reproduction |
33 meta-analyses |
These are one-off unpublished attempts to reproduce individual studies
Reproduction |
Original study |
https://rdoodles.rbind.io/2019/06/reanalyzing-data-from-human-gut-microbiota-from-autism-spectrum-disorder-promote-behavioral-symptoms-in-mice/ and https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/ |
Sharon, G. et al. Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177 (6), 1600–1618.e17. |
Wei, X.; Nielsen, R. CCR5-∆32 Is Deleterious in the Homozygous State in Humans. Nat. Med. 2019 DOI: 10.1038/s41591-019-0459-6. (retracted) |
Authors/Date |
Title |
Field |
Type |
Why most published research findings are false |
Science |
Statistical reproducibility |
|
A Quick Guide to Organizing Computational Biology Projects |
Bioinformatics |
Best practices |
|
Ten Simple Rules for Reproducible Computational Research |
Computational science |
Best practices |
|
The Generalizability Crisis |
Psychology |
Statistical reproducibility |
|
Unreproducible Research is Reproducible |
Machine Learning |
Methodology |
|
Trustworthy data underpin reproducible research |
Physics |
Scientific philosophy |
|
Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity |
Science |
Statistical reproducibility |
|
A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility |
Science |
Best practices |
|
The importance of transparency and reproducibility in artificial intelligence research |
Artificial Intelligence |
Critique |
|
What is replication? |
Science |
Scientific philosophy |
|
Realistic and Robust Reproducible Research for Biostatistics |
Biostatistics |
Best practices |
|
COVID-19 pandemic reveals the peril of ignoring metadata standards |
Virology |
Critique |
Authors/Date |
Title |
Tools |
Out-of-the-box Reproducibility: A Survey of Machine Learning Platforms |
MLflow, Polyaxon, StudioML, Kubeflow, CometML, Sagemaker, GCPML, AzureML, Floydhub, BEAT, Codalab, Kaggle |
|
A Survey on Collecting, Managing, and Analyzing Provenance from Scripts |
Astro-Wise, CPL, CXXR, Datatrack, ES3, ESSW, IncPy, Lancet, Magni, noWorkflow, Provenance Curios, pypet, RDataTracker, Sacred, SisGExp, SPADE, StarFlow, Sumatra, Variolite, VCR, versuchung, WISE, YesWorkflow |
|
The Role of Metadata in Reproducible Computational Research |
CellML, CIF2, DATS, DICOM, EML, FAANG, GBIF, GO, ISO/TC 276, MIAME, NetCDF, OGC, ThermoML, CRAN, Conda, pip setup.cfg, EDAM, CodeMeta, Biotoolsxsd, DOAP, ontosoft, SWO, OBCS, STATO, SDMX, DDI, MEX, MLSchema, MLFlow, Rmd, CWL, CWLProv, RO-Crate, RO, WICUS, OPM, PROV-O, ReproZip, ProvOne, WES, BagIt, BCO, ERC, BEL, DC, JATS, ONIX, MeSH, LCSH, MP, Open PHACTS, SWAN, SPAR, PWO, PAV, Manubot, ReScience, PandocScholar |
|
Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication |
Authorea, Binder, CodeOcean, eLife RDS, Galaxy Project, Gigantum, Manuscript, o2r, REANA, ReproZip, Whole tale |
- MOOCs
- Coursera Reproducible Research - Roger Peng et al JHU. Very popular course.
- edX Principles, Statistical and Computational Tools for Reproducible Science - John Quackenbush et al Harvard
- Online course content
- Tools for Reproducible Research - Karl Broman UW, includes resources page
- R for Reproducible Scientific Analysis - Software Carpentry workshop primer using Gapminder data
- R-DAVIS - Student-developed computer literacy and data course in R
- AMIA2019 - Pragmatic RR for Analysis, Dissemination and Publication
- R
- CRAN Task View - Reproducible Research - packages relevant to RCR in R
- liftr - persistent reproducible reporting through containerized R Markdown documents
- repo - provenance framework package
- orderly - R package that automates writing reproducible analyses
- Open With Binder for Chrome or Firefox - open the GitHub repository you are visiting using MyBinder.org
- DVC - DVC tracks machine learning models and data sets
- Reproducible Research with R and R Studio 2013
- Implementing Reproducible Research 2014 - Describes projects: Sumatra, Vistrails, CDE, SOLE, JUMBO, CML, knitr. Content available on OSF.
- The Practice of Reproducible Research 2017 - 31 first person case narratives and intro chapters
- Dynamic Documents with R and knitr 2015
- The Turing Way: A Handbook for Reproducible Data Science 2020
All these repositories assign Digital Object Identifiers (DOIs) to data
- DataCite - 12M+ DOIs registered for 46 allocators. Offers APIs and a metadata schema.
- Data Dryad - curated, metadata-centric, focused on articles associated with published artices, $120 submission fee (various waivers available)
- Figshare - 20 GB of free private space, unlimited public space, >2M articles, >5k projects
- OSF - Project-oriented system with access control and integration with popular tools. Unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.
- Zenodo - Allows embargoed, restricted access, metadata support. 50GB limit.
- Jupyter Gallery - Gallery of interesting Jupyter notebooks
- Papers With Code - ML papers with code
- NARPS - Code related to Neuroimaging Analysis Replication and Prediction Study
- Codeocean - A gallery of cloud-based containers with reproducible analyses
- ReScience - Journal dedicated to insilico reproductions and tests of robustness, lives on Github.
- ReplicationWiki - Replication in the social sciences, particularly economics
- FAIRsharing - standards, databases, and policies
- BioPortal - 660 biomedical ontologies
- ResearchObject.org - RO specifications and publications
- BioCompute - BCO specs
- rOpenSci - Tools, conferences, and education
- Open Science Framework - Open source project management
- pyOpenSci - Promotes open and reproducible research through peer-review of scientific Python packages
- Replication Network - Furthering the practice of replication in economics. Econ replication database.
- repliCATS project - Estimating the replicability of research in the social sciences
- ReproHack - 1-day reproducibility hackathons held worldwide
- CODECHECK - community for checking executability of scientific preprints and papers
- Awesome Pipeline - So many pipelines frameworks
- Awesome Docker - Everything related to the Docker containerization system
- Awesome R - Section on RR tools
- Awesome Reproducible R - RRR tools
- Awesome Jupyter - Jupyter projects, libraries and resources
- Awesome Bioinformatics Benchmarks - Benchmarks are a related aspect of robustness testing
- Awesome Open Science - Resources, data, tools, and scholarship
- Awesome Public Datasets - A topic-centric list of HQ open datasets
- Awesome Semantic Web - Semantic web and linked data resources.
Contributions welcome! Read the contribution guidelines first. You may find my src/doi2md.py
script useful for quickly generating entries from a DOI.
To the extent possible under law, Jeremy Leipzig has waived all copyright and related or neighboring rights to this work.