Basic K-Means clustering model to identify the optimal number of clusters for categorising shot locations from Lionel Messi's career.
This project contains a modular Python code base and some open source football event data generated by StatsBomb. The code performs the following functions:
- Iterates through StatsBomb's event data, identifies all shots by Lionel Messi and extracts the xy coordinates of shot locations
- Performs the K-Means elbow test on the Messi shot data across a range of n_clusters values to find the optimal value
- Runs the K-Means clustering algorithm across the Messi data using the optimal n_clusters value, generating a plot of clusters and their centres on a half-pitch plot
It is recommended to create a virtual environment and install the dependencies listed in the requirements file. This can be done in the command line by:
python3 -m venv my_venv
source my_venv/bin/activate
pip3 install -r requirements.txt
The code base accessed locally by cloning the repository. After navigating to your local directory of choice, run the following in the command line:
git clone https://github.com/calvindoesdata/messi_shots_kmeans.git
Alternatively the project can be downloaded as a .zip from the repository home page by selecting 'Code' > 'Download ZIP'.
This project can be run from the command line using the following commands:
cd .../messi_shots_kmeans/
python3 main.py