Colinasda / Kmeans

针对二维平面上的点集做Kmeans聚类,经过多次episode训练,集群点趋于稳定

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kmeans

1 Problem

Applying the Kmeans into dataset and draw the clustering result

2 Specific Process

Step1. Read the csv file

We use the pandas to read the csv file "Points.csv" and set the header for the dataset. Besides, we use some functions to check the basic information of dataset, like there are totally 90 records in the dataset. Then we can better draw the scatter plot by these description information.

In order to have a better view of dataset, we use the matplotlib to draw a scatter plot for each record.The plot is shown below.(The green '+' represent each record and the red triangles represent the initial centers.)

Step2. Calculate the distance

Iterator all points and put them into the nearest central points.

points_set = {key: [] for key in range(K)}
for p in p_list:
    nearest_index = np.argmin(np.sum((centeroid - p) ** 2, axis=1) ** 0.5)
    points_set[nearest_index].append(p)

Then we can draw the first iteration result of clustering. The scatter plot is shown below.

Step3. 10 episode training

Use the for loop to do 10 episode training, the process is very similar with above, then we can find that the clustering results become more and more stable according to the scatter plot.

The following picture shows the clustering result after 10 episode.

The complete 10 scatter plots are in the jupyter notebook file.

Step4. Print the new central points

After 10 episode, we can get three new central points, which are

[[ 58  60]
 [ 20 121]
 [120  21]]

About

针对二维平面上的点集做Kmeans聚类,经过多次episode训练,集群点趋于稳定


Languages

Language:Jupyter Notebook 100.0%