In this post, I will go over how to write a K-means clustering algorithm from scratch using NumPy. The algorithm will be explained in the next section and while seamingly simple, it can be tricky to implement efficiently! As an added bonus, I will go over how to implement a Scikit-Learn compatible clustering algorithm so that we can using Scikit-Learn's framework including Pipelines and GridSearchCV.
You can install the dependencies and access the notebook using Docker by building the Docker image with the following:
docker build -t kmeans .
Followed by running the command container:
docker run -ip 8888:8888 -v `pwd`:/home/jovyan -t kmeans
See here for more info.
Otherwise without Docker, make sure to use Python 3.9 and install the libraries listed in requirements.txt
. These can be installed with the command,
pip install -r requirements.txt