K Nearest Neighbor (KNN) is a simple and effective algorithm. In this homework, we will implement a simple KNN classifier and use it to classify Iris flower data.
Upon receiving a new ipute data, KNN searches the training data and finds the K samples with the shortest distance (nearest neighbor). The majority category of these K samples is used as a prediction. The steps of KNN are listed below:
-
After receiving a new ipute matrix, calculate a distance matrix. Given an input matrix
$X$ and training data matrix$Y$ ,$X \in \mathbb{R}^{m \times d}, Y \in \mathbb{R}^{n \times d}$ . The Euclidean distances of input and trainign samples can be calcuated as$(X - Y)^2=X^2 - 2X*Y^T + Y^2$ . We will get a$n \times m$ distance matrix. -
Sort distance matrix in columns, get K nearest neighbors.
-
Count the majority class of the K nearest neighbors. Report them as predictions.
Reference: kNN Classifier from Scratch (numpy only)
The data are stored in the data folder. The labels are saved in the last column.
PyTest will be used to run knn_test.py to evaluate your model. Please do not modify this file.