andrew-cui-zz / mlb-game-prediction

predicting major league baseball games using logit regression - eas 499, sp2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mlb-game-prediction

thesis code

University of Pennsylvania

EAS 499, Senior Capstone Thesis Andrew Cui Advisor: Dr. Shane T. Jensen


We use these models in a predictive analysis of Major League Baseball games, extracting data from Retrosheet logs and performing extensive data wrangling, preprocessing and feature engineering to identify smart covariates to use. We targeted binary classification of whether a game would be won by the home team or not.

Overall, the logit elastic net model scored an accuracy of 61.77%, exceeding our naive classifiers and many examples from the literature. This repository details the code bank that was used in analysis, including relevant charts and graphics used.

Further detail about the analytical approach can be found in the paper itself. Please direct questions to Andrew Cui (andrewc@seas.upenn.edu)

About

predicting major league baseball games using logit regression - eas 499, sp2020


Languages

Language:Jupyter Notebook 99.6%Language:Python 0.3%Language:TeX 0.1%