clusterID of the original samples

Question

clusterID of the original samples

isaac-you opened this issue 6 years ago · comments

SOM clustering is a good customer segment method, and your somoclu make the method strong enough to deal with big data. Thank you so much.
But when I have done the train process, I can only find clusterID for nodes or neurons, but there is no clusterID for the original samples. Besides your default cluster number is 8 for kmeans, so how can I set another cluster number? Thank you so much for your help.

bigdatamath · Answer 1 · Wed Oct 17 2018 09:51:25 GMT+0800 (China Standard Time)

best matching units array do not tell me the ClusterID directly. when I do the experiment from https://somoclu.readthedocs.io/en/stable/example.html for the 150 random samples, the best matching units array just give me the a matrix of shape (150,2) , but no ClusterID, it is more like a coordinate for 150 samples in 2-D space.
So how can I find the ClusterID for original 150 samples, thank you.

deepwindlee · Answer 2 · Thu Jul 04 2019 20:47:46 GMT+0800 (China Standard Time)

请问我要怎么知道样本聚类后所属的具体种类呢

Mykhailo Ziatin · Answer 3 · Mon Jan 18 2021 03:04:59 GMT+0800 (China Standard Time)

Hi, @isaac-you, you can use best matching units as suggested in documentation.

bmus = som.get_bmus(som.get_surface_state(X))
cluster_labels = [som.clusters[bmu[0]][bmu[1]] for bmu in bmus]

However, I am still wondering why there is no such method in the library itself given that it already have clustering support.