edrubin / EC524W21

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2021 Taught by Ed Rubin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EC 524, Winter 2021

Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Stephen Reed.

Schedule

Lecture Tuesday and Thursday, 2:15pm–3:45pm, Zoom and/or MCK 204A

Lab Friday, 12:30pm–1:30pm Zoom

Office hours

  • Ed Rubin (Zoom): TBD
  • Stephen Reed (Zoom): TBD

Syllabus

Syllabus

Books

Required books

Suggested books

Lecture notes

000 - Overview (Why predict?)

  1. Why do we have a class on prediction?
  2. How is prediction (and how are its tools) different from causal inference?
  3. Motivating examples

Formats .html | .pdf | .Rmd

001 - Statistical learning foundations

  1. Why do we have a class on prediction?
  2. How is prediction (and how are its tools) different from causal inference?
  3. Motivating examples

Formats .html | .pdf | .Rmd

002 - Model accuracy

  1. Model accuracy
  2. Loss for regression and classification
  3. The variance-bias tradeoff
  4. The Bayes classifier
  5. KNN

Formats .html | .pdf | .Rmd

003 - Resampling methods

  1. Review
  2. The validation-set approach
  3. Leave-out-out cross validation
  4. k-fold cross validation
  5. The bootstrap

Formats .html | .pdf | .Rmd

004 - Linear regression strikes back

  1. Returning to linear regression
  2. Model performance and overfit
  3. Model selection—best subset and stepwise
  4. Selection criteria

Formats .html | .pdf | .Rmd

In between: tidymodels-ing

005 - Shrinkage methods

(AKA: Penalized or regularized regression)

  1. Ridge regression
  2. Lasso
  3. Elasticnet

Formats .html | .pdf | .Rmd

006 - Classification intro

  1. Introduction to classification
  2. Why not regression?
  3. But also: Logistic regression
  4. Assessment: Confusion matrix, assessment criteria, ROC, and AUC

Formats .html | .pdf | .Rmd

007 - Decision trees

  1. Introduction to trees
  2. Regression trees
  3. Classification trees—including the Gini index, entropy, and error rate

Formats .html | .pdf | .Rmd

008 - Ensemble methods

  1. Introduction
  2. Bagging
  3. Random forests
  4. Boosting

Formats .html | .pdf | .Rmd

009 - Support vector machines

  1. Hyperplanes and classification
  2. The maximal margin hyperplane/classifier
  3. The support vector classifier
  4. Support vector machines

Formats .html | .pdf | .Rmd

Projects

000 Predicting sales price in housing data (Kaggle)

Help:

001 Validation and out-of-sample performance

002 Cross validation, penalized regression, and tidymodels

Paper: Prediction Policy Problems

003 In class: MNIST image classification (with multiple classes!)

Class project

Outline of the project

Topic and group due by 25 February 2021.

Final project submission due by midnight on March 10th.

Lab notes

000 - Workflow and cleaning

  1. General "best practices" for coding
  2. Working with RStudio
  3. The pipe (%>%)
  4. Cleaning and Kaggle follow up

Formats .html | .pdf | .Rmd

001 - Data cleaning: Multiple mutations

  1. Finish previous lab on dplyr
  2. Extending dplyr and mutate

Formats .html | .pdf | .Rmd

002 - Validation

  1. Creating a training and validation data set from your observations dataframe in R
  2. Writing a function to iterate over multiple models to test and compare MSEs

003 - Practice using tidymodels

  1. Cleaning data quickly and efficiently with tidymodels
  2. R-script used in the lab

004 - Ridge, Lasso and Elasticnet Regressions in tidymodels

  1. Ridge, Lasso and Elasticnet regressions in tidymodels from start to finish with a new dataset.
  2. Using the best model to then predict onto a test dataset.

005 - Forcing splits in tidymodels and penalized regression

  1. Combining pre-split data together and then defining a custom split
  2. Running a Ridge, Lasso or Elasticnet logistic regression in tidymodels using a fresh dataset.
  3. Predicting the model onto test data and then viewing the confusion matrix.

Additional resources

R

Data Science

Spatial data

About

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2021 Taught by Ed Rubin


Languages

Language:HTML 86.8%Language:JavaScript 11.1%Language:CSS 1.9%Language:TeX 0.2%Language:R 0.1%