ARCHTYPE OF NATIONAL BASKETBALL ASSOCIATION (NBA) PLAYERS
Motivation
For years players have been recognised by their defensive position on court and also informally known by names such as Rim Runner, Spot Up Bigs, and by their ball movements. This is the results provided on Wikipedia if we search for type of playersin basketball:
But these are defensive position not player type, and similar results can be seen on official NBA website.
Thus there is a urgent need to analyse players on the basis of their performance/game on court rather than position, in order to understand players and team better. And take decisive actions in direction of improvement.
Scraped Data
We will be scraping data from Basketball Reference website and will be referring NBA official website for further help.
In order to scrape data from the above stated website, we used library urlopen & BeautifulSoup to access the data available on website.
Player | Pos | Age | Tm | G | GS | MP | FG | FGA | FG% | 3P | 3PA | 3P% | 2P | 2PA | 2P% | eFG% | FT | FTA | FT% | ORB | DRB | TRB | AST | STL | BLK | TOV | PF | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Álex Abrines | SG | 25 | OKC | 31 | 2 | 19.0 | 1.8 | 5.1 | .357 | 1.3 | 4.1 | .323 | 0.5 | 1.0 | .500 | .487 | 0.4 | 0.4 | .923 | 0.2 | 1.4 | 1.5 | 0.6 | 0.5 | 0.2 | 0.5 | 1.7 | 5.3 |
1 | Quincy Acy | PF | 28 | PHO | 10 | 0 | 12.3 | 0.4 | 1.8 | .222 | 0.2 | 1.5 | .133 | 0.2 | 0.3 | .667 | .278 | 0.7 | 1.0 | .700 | 0.3 | 2.2 | 2.5 | 0.8 | 0.1 | 0.4 | 0.4 | 2.4 | 1.7 |
2 | Jaylen Adams | PG | 22 | ATL | 34 | 1 | 12.6 | 1.1 | 3.2 | .345 | 0.7 | 2.2 | .338 | 0.4 | 1.1 | .361 | .459 | 0.2 | 0.3 | .778 | 0.3 | 1.4 | 1.8 | 1.9 | 0.4 | 0.1 | 0.8 | 1.3 | 3.2 |
3 | Steven Adams | C | 25 | OKC | 80 | 80 | 33.4 | 6.0 | 10.1 | .595 | 0.0 | 0.0 | .000 | 6.0 | 10.1 | .596 | .595 | 1.8 | 3.7 | .500 | 4.9 | 4.6 | 9.5 | 1.6 | 1.5 | 1.0 | 1.7 | 2.6 | 13.9 |
4 | Bam Adebayo | C | 21 | MIA | 82 | 28 | 23.3 | 3.4 | 5.9 | .576 | 0.0 | 0.2 | .200 | 3.4 | 5.7 | .588 | .579 | 2.0 | 2.8 | .735 | 2.0 | 5.3 | 7.3 | 2.2 | 0.9 | 0.8 | 1.5 | 2.5 | 8.9 |
5 | Deng Adel | SF | 21 | CLE | 19 | 3 | 10.2 | 0.6 | 1.9 | .306 | 0.3 | 1.2 | .261 | 0.3 | 0.7 | .385 | .389 | 0.2 | 0.2 | 1.000 | 0.2 | 0.8 | 1.0 | 0.3 | 0.1 | 0.2 | 0.3 | 0.7 | 1.7 |
6 | DeVaughn Akoon-Purcell | SG | 25 | DEN | 7 | 0 | 3.1 | 0.4 | 1.4 | .300 | 0.0 | 0.6 | .000 | 0.4 | 0.9 | .500 | .300 | 0.1 | 0.3 | .500 | 0.1 | 0.4 | 0.6 | 0.9 | 0.3 | 0.0 | 0.3 | 0.6 | 1.0 |
7 | LaMarcus Aldridge | C | 33 | SAS | 81 | 81 | 33.2 | 8.4 | 16.3 | .519 | 0.1 | 0.5 | .238 | 8.3 | 15.8 | .528 | .522 | 4.3 | 5.1 | .847 | 3.1 | 6.1 | 9.2 | 2.4 | 0.5 | 1.3 | 1.8 | 2.2 | 21.3 |
8 | Rawle Alkins | SG | 21 | CHI | 10 | 1 | 12.0 | 1.3 | 3.9 | .333 | 0.3 | 1.2 | .250 | 1.0 | 2.7 | .370 | .372 | 0.8 | 1.2 | .667 | 1.1 | 1.5 | 2.6 | 1.3 | 0.1 | 0.0 | 0.8 | 0.7 | 3.7 |
9 | Grayson Allen | SG | 23 | UTA | 38 | 2 | 10.9 | 1.8 | 4.7 | .376 | 0.8 | 2.6 | .323 | 0.9 | 2.1 | .443 | .466 | 1.2 | 1.6 | .750 | 0.1 | 0.5 | 0.6 | 0.7 | 0.2 | 0.2 | 0.9 | 1.2 | 5.6 |
Data Pre-processing, Feature Enginnering and EDA
Detailed analysis can be found in Jupyter notebook attached above as Archtype of NBA Players, here are some finding from this section.
This is corrolarogram representing correlation between all the feature in the data.
I tried to model linear relationship between all the scoring variables in the processed data.
Clustering Players on the basis of their Similarities and Dissimlarities.
Clustering can be largly classified into following 4 types, where every type uses unique technique to measure differences between the data points. :
- Exclusive Clustering
- Overlapping Clustering
- Hierarchical Clustering
- Probabilistic Clustering We will using all above stated methods excluding Overlapping Clustering.
- K-Means Clustering
Using Elbow plot & Silhouette Coefficient we decide the number of clusters
Elbow Plot
Silhouette Plot & Coefficient for different values of clusters
- Hierarchical Agglomerative Clustering
This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
- Gaussian Mixture Models
Comparision between these Method
we will be moving forward in our analysis with the clustering technique which has highest value of Silhouette Coefficient value.
Results:
Parallel Plot of all players in NBA season 2018-19
Parallel plot gives very good visualisation about the similarities and disimilarities between different clusters. Here every line describes a player and every colour describes a cluster. This is interactie plot, so you can slide over the respective feature axis, fix their limits and analyse the cluster you wish.
Analysis of Results
Conclusion
- From above boxplot we conclude very valuable information that can be used by NBA team coachs to retrospect over their team and strengthen it further by removing and drafting new player of particular type.
- If we consider mean as optimum value for our conclusion, then from above boxplot we can conclude that in order to make a strong team we should have the following proportion of every type of players:
TYPE 1 : 25%
TYPE 2 : 46%-49%
TYPE 3 : 15%
TYPE 4 : LESS than 10%