-
Descrtipion: This course teaches "the why and how" of reproducible and collaborative research by combining questions of good computational practice in science, open science and statistical data analysis, in the context of today's research environment. We will interleave practical topics in software engineering and statistical computing with broader discussions on elements of reproducible data analysis.
-
Details: We will rely on R and RStudio ecosystems, but the core ideas presented here can be equally implemented with tools in Python, Julia, or any other programming language.
-
Instructor: Gaston Sanchez
-
Lecture: 3 hours of lecture per week
-
Assignments: around 6 HW assignments
-
Exams: typically one midterm exam, and final project
-
Prerequisites: Statistics 133, 134, 135
-
Policies:
We are going to be using several tools along this course. Which means that you will have to install the following programs (if you run into any installation problems google it first or check youtube videos; if that doesn't work then ask the GSI or the instructor):
-
Git (version control)
-
GitHub (Git repository hosting service)
- https://github.com/
- Open account (if you don't have one): https://github.com/join
-
R (statistical & data analysis software)
-
Posit RStudio (IDE for R)
- Desktop (free) version: https://posit.co/downloads/
-
Rtools (tools for building R in Windows)
- If you work with Windows, you will need to download Rtools
- https://cran.r-project.org/bin/windows/Rtools/
-
Tex/LaTeX (typesetting system)
- MacTeX (Mac OS X): https://tug.org/mactex/
- MikTeX (Windows): http://miktex.org/download
-
pandoc (universal document converter)
-
Xcode (IDE for Mac OS X)
- If you work on a Mac, you may need to install Xcode
- Mac OS X: https://developer.apple.com/xcode/downloads/
π ABOUT:
We begin with the usual review of the course policies, logistics, overall expectations, topics in a nutshell, etc.
Every Data Analysis Project goes through a cycle: At the conceptual level, we'll identify the main stages of the data analysis cycle using sports data Long Jump world records which are one of the oldest standing records in athletics.
π READING:
- Slides
βοΈ TOPICS:
-
Introduction
- The Data Analysis Cycle (DAC)
- First contact with R and RStudio
-
How not to do a Data Analysis
- Understand limitations of WYSIWYG tools
- Advantages of using WYSIWYM tools
π ABOUT:
In this module we review an infamous case of irreproducibility: the Reinhart-Rogoff Debacle
π READING:
Reinhart and Rogoff Reading Materials
-
Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems. (by Mike Konczal)
-
Reinhart, Rogoff... and Herndon: The student who caught out the profs (by Ruth Alexander)
-
The Reinhart and Rogoff Controversy: A Summing Up (by John Cassidy)
-
FAQ: Reinhart, Rogoff, and the Excel Error That Changed History
-
Holy Coding Error Batman (by Paul Krugman)
-
Reinhart and Rogoff working paper "Growth in a Time of Debt"
-
Various links for the "Special: Reinhart & Rogoff Debacle" (curated by Moreliver's)
βοΈ TOPICS:
- RR Case Study
- Who are Reinhart and Rogoff (R&R)?
- What is their affiliation?
- About their working paper "Growth in times of debt" (GTD)
- What is the main thesis of the paper?
- What are their main findings?
- What are their conclusions?
- Story behind R&R fiasco:
- After the publication of GTD, who tries to reproduce their work?
- What is the story of the irreproducibility attempt?
- What is the cause of the irreproducibility?
π ABOUT:
Markdown is a lightweight markup language, originally created by John Gruber and Aaron Swartz allowing people "to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)".
- About markup languages
- Why do we need to use a markup language (and a text editor)?
- What is the issue with word processors?
Dynamic Documents with R
- What is a dynamic document?
- Dynamic documents and markup languages
- Dynamic documents require a parser and renderer
- In R, we have the packages "knitr", "rmarkdown", and "shiny"
- Before knitr we had "Sweave" (with LaTeX)
- LaTeX is still the de rigueur scientific typesetting system
π READING:
- Slides
βοΈ TOPICS:
-
Markdown
- More technical details about Markdown
- Markdown philosophy
- Small demos:
- from markdown to html
- from markdown to latex
- from markdown to pdf
- from markdown to docx
- etc
- Work with markdown online editors
- Let's check some basics with markdown live preview
- .Rmd files in R
-
R Markdown
- Working with so-called "Dynamic Documents"
- Weaving and Knitting
- Combining narrative and code