madina-k / dse_mk

course materials for Seminar Data Science for Economics (Tilburg University)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science for Economics

This is the github repository for the second part of the Seminar Data Science for Economics. It contains folders with lecture slides and the codes for tutorials.

Copy this repository to your laptop and pull from the master branch before each class to get the most recent version of lecture slides and new tutorials.

The tutorials are based on R. If you like, you still can use Python (or any other programming language), but then you will have to rely on self-study.

Online lectures and tutorials

How to use the online lectures and tutorials:

  1. For each topic, first, listen to video lectures which summarise the material and cover only a simplified version of lecture slides.
  2. For the full treatment of the topic, read through the full version of the lecture slides.
  3. Finally, go through the interactive tutorials which allow you to learn-by-doing. Notice that the correct answers for coding exercises are provided as hints.
  4. Still have questions? Open a discussion on Canvas.

Decision trees

  1. Video Lecture Decision Trees (HINT: put speed to 1.25 or even 1.5)
  2. Lecture slides Decision Trees
  3. Interactive Tutorial Classification Trees Compas version 2. See the instructions below and please run the following code before compiling the tutorial:
req_packages <- c("learnr", "fairness", "tree", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)

Bagging, Random Forest, Boosting

  1. Video Lecture Bagging, RF, Boosting (HINT: put speed to 1.25 or even 1.5)
  2. Lecture slides Bagging, RF, Boosting
  3. Interactive Tutorial Bagging, RF, Boosting version 2. See the instructions below and please run the following code before compiling the tutorial:
req_packages <- c("learnr", "fairness", "tree", "randomForest", "gbm", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)

Why we cannot use Machine Learning for inference (recap)

  1. Video Lecture Why ML Fails (HINT: put speed to 1.25 or even 1.5)
  2. Lecture slides Why ML Fails

Double Machine learning

  1. Attend live lecture using Zoom on Thursday, March 26, at 10:45. Download Zoom in advance for easy experience
  2. Lecture slides Double ML
  3. NON-interactive Tutorial DML
  4. NON-interactive Tutorial DML with ANSWERS

Causal Trees

  1. Attend live lecture using Zoom on Thursday, March 26, at 10:45. Download Zoom in advance for easy experience
  2. Lecture slides Causal Trees
  3. Interactive Tutorial Causal Tree. Please run the following code before compiling the tutorial:
req_packages <- c("learnr", "fairness", "DiagrammeR", "grf", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)

Working with maps

  1. Interactive Tutorial Maps Please run the following code before compiling the tutorial:
req_packages <- c("learnr", "osmdata", "sf", "ggmap", "naniar", "broom", "viridis", "randomForest", "tidyverse") 
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)

HOW TO run an interactive tutorial [updated March 19, 2020]

Do steps 1 to 4. If you fail to run the tutorial successfully, please, post your reply on canvas discussion

Step 1. Pull the updated version of this git repository

Step 2. Open R and install the required packages as indicated for each tutorial. e.g.:

req_packages <- c("learnr", "fairness", "tree", "tidyverse")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)

Step 3. Open the tutorial file in the tutorials/tutorial_xyz/tutorial_xyz.Rmd and click the green button "Run Document" at the top Filedirectory

Rundoc

Step 4. Work with the compiled tutorial. You get the correct answers for each quiz after you submit an answer first. You can get the correct answer to any coding question, by clicking "hint" button at the top panel of the coding chunk.

Checktutorial

Preparing R for the hard work

You have to install R and R Studio Desktop on your laptop.

If you already have those installed, perhaps you still want to update your R and R Studio to the latest available version.

After the installation, open R studio, copy and paste the following code in the console, push enter:

req_packages <- c(
  "tidyverse", # The collection of packages used for data analysis
  "glmnet",  # Regression regularization
  "tree", # Decision trees
  "randomForest", 
  "gbm", # Boosted regression models
  "broom", # Tidies up regression output
  "furrr", # Parallel computing for purrr functions 
  "grf" # Causal forests
)

if (!require("pacman")) install.packages("pacman")
pacman::p_load(req_packages, character.only = TRUE)

# Check whether installation worked. See the output.
loaded_packages <- pacman::p_loaded(req_packages, character.only = TRUE)
if (loaded_packages %>% sum() == length(req_packages)) {
  cat("\n\n\n\nEvery package has been installed correctly\n")
} else {
  cat("\n\n\n\nThere is an installation problem. Trying to install the failed packages again:\n")
  not_loaded <- req_packages[loaded_packages == FALSE]
  install.packages(not_loaded)
}

If the code returns that there was an installation problem, try to run the code above again.

Useful resources

Links for the theoretical material:

Links for R material:

The list of keyboard shortcuts for RStudio

Among them the most useful but not that well known are:

  • cmd+shift+m (or ctrl+shift+m for Windows) to type the pipe operator %>%.
  • alt+up or alt+down to move a line up or down.
  • ctrl+alt+down for multiline selection.
  • cmd+shift+d (or ctrl+shift+d) duplicates the line or selection

Links for Python code:

About

course materials for Seminar Data Science for Economics (Tilburg University)


Languages

Language:HTML 99.7%Language:R 0.3%