elbow-plot silhouette-coefficient exploratory-data-analysis gaussian-mixture-models kmeans-clustering heirarchical-clustering unsupervised-machine-learning nba-players-archtype plotly-library scraping-data

ARCHTYPE OF NATIONAL BASKETBALL ASSOCIATION (NBA) PLAYERS

Motivation

For years players have been recognised by their defensive position on court and also informally known by names such as Rim Runner, Spot Up Bigs, and by their ball movements. This is the results provided on Wikipedia if we search for type of playersin basketball:

1–Pointguard
2–Shooting guard
3– Small forward
4–Power forward5–Center

But these are defensive position not player type, and similar results can be seen on official NBA website.

Thus there is a urgent need to analyse players on the basis of their performance/game on court rather than position, in order to understand players and team better. And take decisive actions in direction of improvement.

Scraped Data

We will be scraping data from Basketball Reference website and will be referring NBA official website for further help.
In order to scrape data from the above stated website, we used library urlopen & BeautifulSoup to access the data available on website.

	Player	Pos	Age	Tm	G	GS	MP	FG	FGA	FG%	3P	3PA	3P%	2P	2PA	2P%	eFG%	FT	FTA	FT%	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS
0	Álex Abrines	SG	25	OKC	31	2	19.0	1.8	5.1	.357	1.3	4.1	.323	0.5	1.0	.500	.487	0.4	0.4	.923	0.2	1.4	1.5	0.6	0.5	0.2	0.5	1.7	5.3
1	Quincy Acy	PF	28	PHO	10	0	12.3	0.4	1.8	.222	0.2	1.5	.133	0.2	0.3	.667	.278	0.7	1.0	.700	0.3	2.2	2.5	0.8	0.1	0.4	0.4	2.4	1.7
2	Jaylen Adams	PG	22	ATL	34	1	12.6	1.1	3.2	.345	0.7	2.2	.338	0.4	1.1	.361	.459	0.2	0.3	.778	0.3	1.4	1.8	1.9	0.4	0.1	0.8	1.3	3.2
3	Steven Adams	C	25	OKC	80	80	33.4	6.0	10.1	.595	0.0	0.0	.000	6.0	10.1	.596	.595	1.8	3.7	.500	4.9	4.6	9.5	1.6	1.5	1.0	1.7	2.6	13.9
4	Bam Adebayo	C	21	MIA	82	28	23.3	3.4	5.9	.576	0.0	0.2	.200	3.4	5.7	.588	.579	2.0	2.8	.735	2.0	5.3	7.3	2.2	0.9	0.8	1.5	2.5	8.9
5	Deng Adel	SF	21	CLE	19	3	10.2	0.6	1.9	.306	0.3	1.2	.261	0.3	0.7	.385	.389	0.2	0.2	1.000	0.2	0.8	1.0	0.3	0.1	0.2	0.3	0.7	1.7
6	DeVaughn Akoon-Purcell	SG	25	DEN	7	0	3.1	0.4	1.4	.300	0.0	0.6	.000	0.4	0.9	.500	.300	0.1	0.3	.500	0.1	0.4	0.6	0.9	0.3	0.0	0.3	0.6	1.0
7	LaMarcus Aldridge	C	33	SAS	81	81	33.2	8.4	16.3	.519	0.1	0.5	.238	8.3	15.8	.528	.522	4.3	5.1	.847	3.1	6.1	9.2	2.4	0.5	1.3	1.8	2.2	21.3
8	Rawle Alkins	SG	21	CHI	10	1	12.0	1.3	3.9	.333	0.3	1.2	.250	1.0	2.7	.370	.372	0.8	1.2	.667	1.1	1.5	2.6	1.3	0.1	0.0	0.8	0.7	3.7
9	Grayson Allen	SG	23	UTA	38	2	10.9	1.8	4.7	.376	0.8	2.6	.323	0.9	2.1	.443	.466	1.2	1.6	.750	0.1	0.5	0.6	0.7	0.2	0.2	0.9	1.2	5.6

Data Pre-processing, Feature Enginnering and EDA

Detailed analysis can be found in Jupyter notebook attached above as Archtype of NBA Players, here are some finding from this section.
This is corrolarogram representing correlation between all the feature in the data. I tried to model linear relationship between all the scoring variables in the processed data.

Clustering Players on the basis of their Similarities and Dissimlarities.

Clustering can be largly classified into following 4 types, where every type uses unique technique to measure differences between the data points. :

Exclusive Clustering
Overlapping Clustering
Hierarchical Clustering
Probabilistic Clustering We will using all above stated methods excluding Overlapping Clustering.

- K-Means Clustering

Using Elbow plot & Silhouette Coefficient we decide the number of clusters

Elbow Plot

Silhouette Plot & Coefficient for different values of clusters

- Hierarchical Agglomerative Clustering

This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

- Gaussian Mixture Models

Comparision between these Method

we will be moving forward in our analysis with the clustering technique which has highest value of Silhouette Coefficient value.

Results:

Parallel Plot of all players in NBA season 2018-19

Parallel plot gives very good visualisation about the similarities and disimilarities between different clusters. Here every line describes a player and every colour describes a cluster. This is interactie plot, so you can slide over the respective feature axis, fix their limits and analyse the cluster you wish.

Analysis of Results

Conclusion

- From above boxplot we conclude very valuable information that can be used by NBA team coachs to retrospect over their team and strengthen it further by removing and drafting new player of particular type.

- If we consider mean as optimum value for our conclusion, then from above boxplot we can conclude that in order to make a strong team we should have the following proportion of every type of players:

TYPE 1 : 25%
TYPE 2 : 46%-49%
TYPE 3 : 15%
TYPE 4 : LESS than 10%

- Every team consist of roughly 16 to 18 players, where only 5 of these players are in court while playing. Given the concluded proportion of type of players, it is very important that a team should have/draft the elite players of everytype in order to form a strong team. Also there are many other factors that can decide the fate of a team in a season.For example: the way coachs are making decisions on court in accordace to the situation, defence of our team, understanding between the players and many more.\

- What I concluded is just one aspect of improvement, it will not always guarantee success to a team.

About

This Project aims to cluster players on the basis of their performance on offensive side of court and give Winning Proportion of these groups in a team.

elbow-plot silhouette-coefficient exploratory-data-analysis gaussian-mixture-models kmeans-clustering heirarchical-clustering unsupervised-machine-learning nba-players-archtype plotly-library scraping-data

Languages

Language:Jupyter Notebook 100.0%