0xmeri / course-materials

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Course Materials for Advanced Data Analytics in Economics

Nick Hagerty, Montana State University

Except where otherwise noted, this work is licensed under Creative Commons BY-NC-SA 4.0.


Skip to: Lecture slides | Supplemental labs | External resources


Lecture slides

Fall 2022

Lecture 1: R Basics (.pdf)

  • About R
  • Operators
  • Objects and functions
  • Data frames
  • Vectors
  • Indexing

Lecture 2: Programming in R

  • If/else statements
  • For-loops
  • Functions
  • Vectorization
  • Parallelization

Lecture 3: Productivity Tools

Lecture 4: Data Wrangling

  • Philosophy of tidy data
  • Wrangling data with dplyr
  • Joining data with dplyr
  • Tidying data with tidyr
  • Importing data with readr

Lecture 5: Data Cleaning

  • Join safety
  • Keys and relational data
  • String cleaning
  • Number storage
  • Data Cleaning Checklist (pdf version)

Lecture 6: Data Acquisition

  • Where data comes from
  • Webscraping
  • Using APIs

Lecture 7: Best Practices for Coding and Workflows

  • The perils of bad data cleaning
  • Reproducibility and transparency
  • Best practices (code organization, file organization, version control, abstraction, commenting, unit tests)

Lecture 8: Distinguishing Goals of Data Analysis

  • The Data Generating Process
  • Potential outcomes, counterfactuals, and causal inference
  • Descriptive, Predictive, or Causal Analysis?

Lecture 9: Exploratory Analysis

  • Part 1

    • Summaries, frequency tables and crosstabs in R
    • Characterizing distributions
    • Handling extreme values
    • Handling variable transformations
    • Handling missing data
  • Part 2

    • Characterizing relationships
    • Binscatter
    • The Conditional Expectation Function
    • Adjusting for other variables
    • Bin smoothing and local regression

Lecture 10: Spatial Analysis

  • Intro to Geospatial Data
  • Part 1
    • Spatial data and quick mapping
    • Reference systems and projections
  • Part 2
    • Spatial queries (measurement, relationships)
    • Spatial subsetting
    • Geometry operations
    • Spatial joins

Lecture 11: Data Visualization

  • Basics of ggplot2
  • Plotting examples
  • Colors and themes
  • Principles of data visualization
  • Case studies

Lecture 12: Regression Modeling

  • Basic regression in R
  • Review: Interpreting coefficients
  • Indicator and interaction terms
  • Econometrics packages in R
  • Modeling nonlinear relationships

Lecture 13: Machine Learning Fundamentals

  • Review: Prediction
  • Statistical learning
  • Model accuracy
  • Cross-validation

Lecture 14: Prediction Methods

Lecture 15: Classification Methods

  • Part 1: Methods
    • Classification
    • Logistic regression
    • k-nearest neighbors
    • Model assessment
    • Decision trees
  • Part 2: Examples
    • Logistic regression and KNN
    • Cross-validation
    • Decision trees
    • Teach your laptop to read

Lecture 16: Machine Learning in Economics

  • Predicting outcomes
  • Constructing new data
  • Selecting covariates
  • Predicting causal effects

Lecture 17: Databases and Big Data

  • Tools for big data
  • Databases in R
  • Writing SQL queries
  • Getting started with BigQuery

Supplemental labs

By Laura Sikoski


External resources

This is a list of further resources that you may find helpful throughout (and after!) this course. Start with the course materials above, but check these out for alternative explanations or if you want to take a deeper dive into a particular topic. If one isn't speaking to you, try another.

Basics of R

Programming in R

R Markdown

Git and GitHub

Data wrangling with the tidyverse

Data cleaning

Data acquisition and webscraping

Best practices for coding and workflows

Distinguishing goals of data analysis

Exploratory analysis

Spatial analysis

Data visualization

Regression modeling in R

Fundamentals of machine learning

Shrinkage methods

Classification methods

Machine learning with tidymodels

Unsupervised learning

Further methods in machine learning

  • ISLR (James, Witten, Hastie, Tibshirani).
    • Ch. 8: Tree-Based Methods
    • Ch. 9: Support Vector Machines
    • Ch. 10: Deep Learning
  • Prediction and Machine Learning Lectures (Ed Rubin).
    • Lecture 007: Decision Trees
    • Lecture 008: Ensemble Methods
    • Lecture 009: Support Vector Machines

Applications of machine learning in economics

Databases (SQL)

Distributed and cloud computing

About

License:Other


Languages

Language:HTML 95.8%Language:JavaScript 2.7%Language:CSS 0.9%Language:R 0.5%Language:SCSS 0.0%