This is the github repository for the second part of the Seminar Data Science for Economics. It contains folders with lecture slides and the codes for tutorials.
Copy this repository to your laptop and pull from the master branch before each class to get the most recent version of lecture slides and new tutorials.
The tutorials are based on R. If you like, you still can use Python (or any other programming language), but then you will have to rely on self-study.
- For each topic, first, listen to video lectures which summarise the material and cover only a simplified version of lecture slides.
- For the full treatment of the topic, read through the full version of the lecture slides.
- Finally, go through the interactive tutorials which allow you to learn-by-doing. Notice that the correct answers for coding exercises are provided as hints.
- Still have questions? Open a discussion on Canvas.
- Video Lecture Decision Trees (HINT: put speed to 1.25 or even 1.5)
- Lecture slides Decision Trees
- Interactive Tutorial Classification Trees Compas version 2. See the instructions below and please run the following code before compiling the tutorial:
req_packages <- c("learnr", "fairness", "tree", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)
- Video Lecture Bagging, RF, Boosting (HINT: put speed to 1.25 or even 1.5)
- Lecture slides Bagging, RF, Boosting
- Interactive Tutorial Bagging, RF, Boosting version 2. See the instructions below and please run the following code before compiling the tutorial:
req_packages <- c("learnr", "fairness", "tree", "randomForest", "gbm", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)
- Video Lecture Why ML Fails (HINT: put speed to 1.25 or even 1.5)
- Lecture slides Why ML Fails
- Attend live lecture using Zoom on Thursday, March 26, at 10:45. Download Zoom in advance for easy experience
- Lecture slides Double ML
- NON-interactive Tutorial DML
- NON-interactive Tutorial DML with ANSWERS
- Attend live lecture using Zoom on Thursday, March 26, at 10:45. Download Zoom in advance for easy experience
- Lecture slides Causal Trees
- Interactive Tutorial Causal Tree. Please run the following code before compiling the tutorial:
req_packages <- c("learnr", "fairness", "DiagrammeR", "grf", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)
- Interactive Tutorial Maps Please run the following code before compiling the tutorial:
req_packages <- c("learnr", "osmdata", "sf", "ggmap", "naniar", "broom", "viridis", "randomForest", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)
Do steps 1 to 4. If you fail to run the tutorial successfully, please, post your reply on canvas discussion
Step 1. Pull the updated version of this git repository
Step 2. Open R and install the required packages as indicated for each tutorial. e.g.:
req_packages <- c("learnr", "fairness", "tree", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)
Step 3. Open the tutorial file in the tutorials/tutorial_xyz/tutorial_xyz.Rmd
and click the green button "Run Document" at the top
Step 4. Work with the compiled tutorial. You get the correct answers for each quiz after you submit an answer first. You can get the correct answer to any coding question, by clicking "hint" button at the top panel of the coding chunk.
You have to install R and R Studio Desktop on your laptop.
If you already have those installed, perhaps you still want to update your R and R Studio to the latest available version.
After the installation, open R studio, copy and paste the following code in the console, push enter:
req_packages <- c(
"tidyverse", # The collection of packages used for data analysis
"glmnet", # Regression regularization
"tree", # Decision trees
"randomForest",
"gbm", # Boosted regression models
"broom", # Tidies up regression output
"furrr", # Parallel computing for purrr functions
"grf" # Causal forests
)
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)
# Check whether installation worked. See the output.
loaded_packages <- pacman::p_loaded(req_packages, character.only = TRUE)
if (loaded_packages %>% sum() == length(req_packages)) {
cat("\n\n\n\nEvery package has been installed correctly\n")
} else {
cat("\n\n\n\nThere is an installation problem. Trying to install the failed packages again:\n")
not_loaded <- req_packages[loaded_packages == FALSE]
install.packages(not_loaded)
}
If the code returns that there was an installation problem, try to run the code above again.
Links for the theoretical material:
- An Introduction to Statistical Learning with Applications in R (ISLR)
- High-dimensional methods and inference on structural and treatment effect (Double Reg)
- Double/debiased machine learning for treatment and structural parameters (Double ML
- Recursive partitioning for heterogeneous causal effects (Causal Forest)
Links for R material:
- R for Data Science
- Cookbook R
- Visualization with ggplot2
The list of keyboard shortcuts for RStudio
Among them the most useful but not that well known are:
cmd+shift+m
(orctrl+shift+m
for Windows) to type the pipe operator%>%
.alt+up
oralt+down
to move a line up or down.ctrl+alt+down
for multiline selection.cmd+shift+d
(orctrl+shift+d
) duplicates the line or selection
Links for Python code: