peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Home Page:https://peterwittek.github.io/somoclu/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clusterID of the original samples

isaac-you opened this issue · comments

SOM clustering is a good customer segment method, and your somoclu make the method strong enough to deal with big data. Thank you so much.
But when I have done the train process, I can only find clusterID for nodes or neurons, but there is no clusterID for the original samples. Besides your default cluster number is 8 for kmeans, so how can I set another cluster number? Thank you so much for your help.

best matching units array do not tell me the ClusterID directly. when I do the experiment from https://somoclu.readthedocs.io/en/stable/example.html for the 150 random samples, the best matching units array just give me the a matrix of shape (150,2) , but no ClusterID, it is more like a coordinate for 150 samples in 2-D space.
So how can I find the ClusterID for original 150 samples, thank you.

请问我要怎么知道样本聚类后所属的具体种类呢

Hi, @isaac-you, you can use best matching units as suggested in documentation.

bmus = som.get_bmus(som.get_surface_state(X))
cluster_labels = [som.clusters[bmu[0]][bmu[1]] for bmu in bmus]

However, I am still wondering why there is no such method in the library itself given that it already have clustering support.