joshorenstein / pitching-analysis

GAM-based model that predicts FIP based on expected whiff rate, command and expected contact from Statcast data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Expected FIP

The model:

This is a Generalized Additive Model (GAM) using random effects and restricted maximum likelihood to rate MLB pitcher stuff and translate it to predicted FIP based on stuff. It uses a series of these models grouped by pitch type and batter side that regresses HR rates and swinging strike rates based on the expected outcomes given the models below. Expected swing and miss rate is then fit to strikeout percentage. Walk rates are given from actual game data and then Expected FIP is fit from expected swinging strike rates, expected home run rates and actual walk rates. This model uses 2020 MLB season as training data. '21 season data will be used as test set and GAM will be tuned once I've got some test data.

The point:

Determine how good a pitcher's stuff is and how it should translate to performance. See who has the best pitches in baseball. See which pitchers are optimizing their arsenal. Is a pitcher throwing his best pitches most often?

The fun stuff:

MLB Leaders
Blue Jays
Giants
Mariners
Pirates

The scripts:
  • Download and clean Statcast data (h/t to Ethan Moore and Bill Petti. This is mostly their code.)
  • Feature selection testing PCA and ultimately using VIF to test multicollinearity of predictors
  • Fastball/Sinker models include pitch velo, release point, batter side, spin rate, spin direction, break and plate location
  • Breaking ball models include pitch velo, release point, batter side, spin rate, spin direction, break, plate location, and relative horizontal break and velo compared to fastball
  • Changeup models include pitch velo, release point, batter side, spin rate, spin direction, break, plate location, and relative horizontal break and velo, vert break and spin direction compared to fastball
  • Home run models are currently based on the above factors but with home run as the dependent variable instead of swinging strikes. This model will eventually be rebuilt with different variables.
  • There is some calculation of FIP and some summary analysis.
Code Example - Fastball Whiff Rate Model
train_select %>%
  filter(pitch_type %in% c("FF","SI")) %>% 
  group_by(pitch_type,p_throws,stand) %>%
  do(fit = gam(whiff ~ release_speed+release_pos_x+release_pos_z+release_extension
                 release_spin_rate+release_spin_direction+pfx_x+pfx_z
                 +plate_x+plate_z, data = .,family=binomial,method="REML",bs="re"))

About

GAM-based model that predicts FIP based on expected whiff rate, command and expected contact from Statcast data


Languages

Language:R 100.0%