Moneyball? More like Funnyball amirite?

Funnyball is a a binary classifier for predicting post-season college basketball victories based on regular season data and seeds.

Potential Data sets

###Other potential data sets to incorporate

Team Stats - transform %

Vegas Odds

Regular season versus Unfortunately I found that regular season matchups of teams in the post season are too sparse to use, and not predictive.

Regular season win records vs common teams - i.e. top 4/8/16 seeds that season - i.e. all other seeded teams in the tournament

Aggregated Team Ratings - Sagarin

News - Injuries - Coaching Changes

Player based - biometric - Individual statistics

For all matchups in previously known postseasons

Observation (team1_team2)
DID_WIN_IN_POSTSEASON (0,1)
SEED_DIFFERENTIAL (lseed - wseed; higher means winner was favored)
REGULAR_SEASON_WIN_LOSS (when the teams matched up in the regular season, what was the ratio of team 1 to team 2's wins)

This can be done with

Visualize the data in a scatter plot

You should see a graph like this:

Run a random forest classifier on this data to see how significant the features are in predicting the response variable:

Run the steps in R_model.R in RStudio. Note that you may need to install the following packages
- e1071
- ggplot2
- randomForest

data <- read.table("output/input-r.csv", header=TRUE, sep=",")
rf <- randomForest(x=data[,c("seed.advantage","seed.win.loss.advantage.64")], y=as.factor(data[,c("did.win")]), importance=TRUE, proximity=TRUE)

Run the random forest model created in the previous step against the current year's regular season results and tournament seeds

data <- read.table("kaggle_data/current_season-r.csv", header=TRUE, sep=",")
data$predictWillWin <- predict(rf,data[,c("seed.advantage","seed.win.loss.advantage.64")])