DougManuel / ds-presentation

Data science presentation and resources

Home Page:https://dougmanuel.github.io/ds-presentation/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A data science tool kit has emerged to support open science

Evaluate your use of the data science tool kit. Quiz

Explanation of the quiz.

Imperitive

  • Concern regarding a crisis in research reproducibility
  • Big data and more complex analytic models
  • Renewed interest in open science
  • Expanded collaborations

Goal: improve research transparency, reproducibility, quality, efficiency and implementation

Internationally, a growing voice of concern about research reproducitibity

“Academic institutions can and must do better. We should be taking multiple approaches to make science more reliable.”

Jeffrey Flier. Dean of Medicine, Harvard University. Nature 549, 133 (2017)

“Put simply, this means that researchers should make their computational workflow and data available for others to view. They should include the code used to generate published figures and omit only data that cannot be released for privacy or legal reasons.”

Jeffrey M. Perkel. A toolkit for data transparency takes shape. Nature 560, 513-515 (2018)

"More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments."

Monya Baker. 1,500 scientists lift the lid on reproducibility. Nature 533, 452-4 (2016)

References

General

  1. Donoho D, 50 years of Data Science. Sept. 18, 2015

  2. Stukel TA, Austin PC, Azimaee M, Bronskill SE, Guttmann A, Paterson JM, Schull MJ, Sutradhar R, Victor JC. Envisioning a Data Science Strategy for ICES. Toronto, ON: Institute for Clinical Evaluative Sciences; 2017. ISBN: 978-1-926850-77-1

  3. Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nature reviews Cardiology. 2016;13(6):350-9.

  4. Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, et al. Best practices for scientific computing. PLoS Biol. 2014;12(1):e1001745.

  5. Hicks SC, Irizarry RA. A Guide to Teaching Data Science. The American Statistician. 2017;72(4):382-91. 10.1080/00031305.2017.1356747

Open Science

  1. Flier, J. (2017). Faculty promotion must assess reproducibility. Nature, 549(7671), 133. doi:10.1038/549133

  2. Perkel, J. M. (2018). A toolkit for data transparency takes shape. Nature, 560, 513-515.

  3. Baker, M. 1,500 scientists lift the lid on reproducibility. [Nature 533, 452-4 (2016)](https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.199700.

  4. Woelfle, M.; Olliaro, P.; Todd, M. H. (2011). Open science is a research accelerator. Nature Chemistry. 3: 745–748. doi:10.1038/nchem.1149

  5. Stodden, V., McNutt, M., Bailey, D. H., Deelman, E., Gil, Y., Hanson, B., . . . Taufer, M. (2016). Enhancing reproducibility for computational methods. Science, 354(6317), 1240-1241. doi:10.1126/science.aah6168

  6. Kopt D. This year’s Nobel Prize in economics was awarded to a Python convert. qz.com Oct 2018.

  7. Somers J. The Scientific Paper Is Obsolete: Here's what's next. The Atlantic Apr 2018.

  8. Kitzes J, Turek D, Deniz F. The practice of reproducible research: case studies and lessons from the data-intensive sciences. Univ of California Press; 2017.

  9. Pioneering ‘live-code’ article allows scientists to play with each other’s results. Nature

Git and version control

  1. Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost Fda V, et al. Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS Comput Biol. 2016;12(7):e1004947.

  2. Git/Github guide

  3. Version control with Git

  4. Git and GitHub learning resources

  5. Integration of GitHub with SAS

  6. Gitkraken (the Git client our team uses)

Code documentation

  1. What nobody tells you about documentation. Divio Blog. Accessed Nov 2018

  2. Jupyter Notebooks

  3. Why Jupyter is data scientist’ computational notebook of choice

  4. Introduction to R Markdown

  5. R Markdown: The definitive guide

  6. R Markdown cheat sheet

  7. Advantages to using R Markdown for data analysis over Jupyter Notebooks

Programming

  1. Population Health Data Science with R. Tomas J Argon

  2. R for Data Science. G Grolemund and H Wickham

  3. Efficient R programming. C Gillespie, R Lovelace

  4. R for Data Science- Chapter 19: Functions. G Grolemund, H Wickham

Metadata

  1. IBM developerWorks. What is PMML? Accessed 2018.

About

Data science presentation and resources

https://dougmanuel.github.io/ds-presentation/

License:MIT License


Languages

Language:HTML 92.7%Language:CSS 7.3%