A repository to keep my NBA data and a few projects.
- This project is active, but the documentation for it isn't up-to-date. It's on my to-do list.
- A scraper for Basketball Reference stats
- A fun ranking system for players. It's pretty obviously not accurate, so don't @me about it.
- Russell Westbrook's MVP season is pretty clearly the 🐐.
- A new FG% that weights FG% using shots attempted. This allows a FG% to be a better measure of rather any given shot will go in.
- Consolidates raw scraped team statistics into a single csv for easier access.
- Adds team stats to player stats. So now Wins, Losses, etc can be taken into consideration when evaluating a player.
- Machine learning projects involving only team stats.
- Uses KNeighborsClassifier to classify players into categories.
- Machine Learning projects involving player stats.
- An experiment in processing data to be more useful to the ML algorithms.
- Determines how far a player's statistics are from the average. This helps account for stat inflation by assuming the average skill level remains the same year over year.
- For example, if league average for the year is 10 PPG, and I score 25, I would be a +15 PPG player.
- The next year if league average is 5 PPG and I score 20, I'm still a +15 PPG Player.
- Use normalization in comparing players across years. Maybe add it in to the MVP predictions as well to prevent stat
inflation bias.
+
Using a "distance from the average" or "Off Average" metricNormalize for pace- Combine the two into a single dataset.
- Update Documentation
Scrape Team Per Game, Advanced,and Shooting stats and consolidate them.- The Team Linear Regression Notebook is unfinished.
- Decide on/implement a new way to store the scraped data. The current method is going to get messy incredibly quickly.
- Rewrite Scrapers using pd.read_html() because it's just a better way to do things.
- Develop a single all-stat scraper that consolidates all stats when first scraped.
- Add in team stats before consolidating traded players. Use weighted average to give a rough estimate of how good the player's various teams are.
- This is on hold due to issues in the player stat scraper and the player and team stat combiner.
- Rework the player stat + team stat combiner so that traded players are accounted for properly.
- The issue is that traded players don't have team statistics.
- The fix I'm planning to implement is to add team stats before consolidating the individual rows of a single player. Thus when combined, the player will now be on a "hybrid team" that has the statistics of a weighted average of each team they played for that year. This is an imperfect system, but it's the best I can do with the data avaliable.
- The issue is that traded players don't have team statistics.
- Rework the player stat scraper so that traded players don't have 100+ games in a season.
- This is occurring because I misinterpreted the meaning of the "team name" TOT.
- I now understand that this means Total, but before I interpreted it as Toronto. I consolidated all player stats correctly, except I included the TOT row as well. This led to player's games being counted twice.
- The fix will involve removing the TOT column from the player's stats before consolidation.
- This is occurring because I misinterpreted the meaning of the "team name" TOT.
- Test various regression algorithms to attempt to predict Award Winners.
- Implement minimum/maximum requirements for SMOTY voting. The dataset currently contains all players, not just those that actually come off the bench.
- Generate a dataset to predict MIP. In real life MIP is based on previous years' stats.
- Maybe subtract a player's previous year's stats from their present year's stats.
- For example (2020 PTS - 2019 PTS) = DIFF PTS, which could show generally, how much a player improved from 1 year to the next.
- Will most likely need to do away with team stats for the MIP datasets.
- Maybe subtract a player's previous year's stats from their present year's stats.
- A bs4 based team stat scraper. It will be 10,000x faster than the selenium approach.
- Or just use pd.read_html because that's a thing I know about now.
Scrape award voting/winners.Add relevant team stats and award voting data to player data.Cluster Players using a ML algorithm and interpret the results