yfnian / course-materials

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Course Materials for Advanced Data Analytics in Economics

Fall 2021

Nick Hagerty, Montana State University

Data Cleaning

Data Cleaning Checklist

Exploratory Analysis

Part 1

  • Summaries, frequency tables and crosstabs in R
  • Describing distributions (histograms, kernel density, bandwidth choice, stratification)
  • Handling extreme values
  • Handling variable transformations
  • Handling missing data

Part 2

  • Describing relationships (Anscombe's Quartet, scatterplots, transformations, binscatter)
  • Conditional expectations (the CEF, motivation for linear regression)
  • Adjusting for other variables (Simpson's Paradox, manual/visual partialing out of binary control variables or fixed effects)
  • Smoothing (bin smoothing/moving averages, local regression, kernels, tips about smoothing functions in R)

Draws some material from Rafael A. Irizarry and Ed Rubin, data from Gabors Data Analysis, and inspiration from Nick Huntington-Klein.

About


Languages

Language:HTML 92.2%Language:CSS 7.1%Language:JavaScript 0.7%