kno10 / python-kmedoids

Fast K-Medoids clustering in Python with FasterPAM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

full examples?

heshpdx opened this issue · comments

Hi, do you have a full example available with a sample data set and expected output?

I followed the directions in the README.md and got to this issue:
AttributeError: module 'kmedoids' has no attribute 'fasterpam'

I used help("kmedoids") to determine that the function is not fasterpam, but rather fasterpam_{i32, i64, f32, f64}. Am I missing a step to get the API cited in the docs?

I am running Ubuntu 20 on a newer aarch64 machine, with rustc 1.57.0 and Python 3.8.10.

    _fasterpam_f64 = fasterpam_f64(...)
        Run $variant k-medoids clustering function for $type precision
        
        :param dist: distance matrix
        :type dist: ndarray
        :param meds: initial medoids
        :type meds: ndarray
        :param max_iter: maximum number of iterations
        :type max_iter: int
        :return: k-medoids clustering result
        :rtype: KMedoidsResult

Thanks!

You appear to have wrongly used

import kmedoids.kmedoids

rather than the correct

import kmedoids

(This may be due to an imprecision in the latest installation instructions, that I am about to fix: the compiled kmedoids.so is supposed to be in the folder kmedoids, not the root folder. Improvement of installation instructions is tracked in #3)

The former is the raw rust module (maybe we should have named it differently), not the easy-to-use python wrapper.

The example in the README works fine, e.g., on colab:

!pip install kmedoids
import kmedoids
help(kmedoids.fasterpam)

Please try the update instructions to compile and install:

pip install maturin
maturin develop --release

Maturin takes care of installing the Rust library into the correct location along with the wrapper.

I build libkmedoids.so, renamed it and moved it to a directory: kmedoids/kmedoids.so. Then I use import kmedoids. This is when I get AttributeError: module 'kmedoids' has no attribute 'fasterpam' .

I tried again with the new directions. I'm having a tough time getting maturin to work. After pip installing successfully, the maturin command is not found. I tried a bunch of things but no dice. I looked up the maturin documentation and I cannot figure out why I am stuck.

Also, my request above was for a sample script that uses sample data and provides a known output. Akin to a unit test that was requested in issue #4 . It would be great to see an example that works practically. For example, pick 10 random points in 3-dimensional space, put them in an ndarray, and use this API to find the kmedoid and print it out. Is this possible? Thanks.

If you use pip without a virtualenv, it will usually install $HOME/.local/bin (on Linux), which was not on my path either; but that is pip standard behavior, and you need to check pip documentation for the location on your OS.
As mentioned, you seem to have the nested (naming is by maturin, I could not get it to use kmedoids._kmedoids instead) module on your path. Any chance that you are inside the kmedoids folder, not in the root of the working tree? Where is the kmedoids/__init__.py on your python search path - because it appears that your python search path contains the .so first.

The distance matrix could be random values, yes. It's just not very meaningful to cluster random values, not even for the sake of an example. I don't like adding dependencies just for examples, but you trivially get an example distance matrix via

from sklearn.datasets import load_iris
from sklearn.metrics.pairwise import euclidean_distances
distmatrix = euclidean_distances(load_iris().data)

but I don't think this adds value to the example, but it distracts and may give the false impression that there is a sklearn dependency.

You can find a full example comparing kmedoids to BanditPAM here, but its also more boilerplate than relevant example code:
https://colab.research.google.com/drive/1-8fMll3QpsdNV5widn-PrPHa5SGXdAIW?usp=sharing

The documentation by now also contains an example using the well-known MNIST data set.