1carvercoleman / ncaam_attendance

Builds data with attendance, COVID-19 cases, and game outcomes from the NCAAM 2017-2021 seasons for use in IV analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ncaam_attendance

This dataset contains attendance for most games in the NCAAM 2017-21 seasons. Of the 3,784 games played in non-neutral locations during 2020-21, this dataset contains 3,056 games (1,541 have information on attendance). Roughly 700 games were dropped due to several discrepancies with team names, city of stadium, and county. Some additional variables include date, away team, home team, score, and county where the game was played. Data was scraped from the ESPN website, and additional data was used from the following sources:

Steps to build and analyze data

The build occurs in two steps:

  1. Run "1. get_ncaam_data.R" to scrape the ESPN website.
  2. Run the STATA do file "2. full_build.do" to compile the game data with other datasets. You must download the "us-counties.csv" from the NYT github repo and copy it into the data folder.

About

Builds data with attendance, COVID-19 cases, and game outcomes from the NCAAM 2017-2021 seasons for use in IV analysis

License:MIT License


Languages

Language:Stata 65.3%Language:R 24.9%Language:Jupyter Notebook 9.7%