considering using `pynndescent` for appox nn search?

Question

considering using `pynndescent` for appox nn search?

giovp opened this issue 3 years ago · comments

no more annoy dependency, pynndescent is also default for umap, and it performs much better

Krzysztof Polanski · Answer 1 · Sat May 22 2021 05:48:12 GMT+0800 (China Standard Time)

This is a pretty good idea actually, thanks for the suggestion - I've been annoyed by annoy for a while as the most recent version throws segfaults all over the place. This also resolves the whole angular metric discrepancy between annoy and UMAP. Need to see how easy this is to install and how efficient index construction is (I remember some of the advertised fast query approaches spending forever crafting theirs, negating the point), but I'm hopeful!

What do you mean by this being a UMAP default by the way?

Giovanni Palla · Answer 2 · Sat May 22 2021 21:55:09 GMT+0800 (China Standard Time)

right, had those segfaults issues just recently 😅

What do you mean by this being a UMAP default by the way?

since umap 0.5 pynndescent is a hard dependency
https://umap-learn.readthedocs.io/en/latest/release_notes.html#what-s-new-in-0-5

Krzysztof Polanski · Answer 3 · Thu May 27 2021 21:18:04 GMT+0800 (China Standard Time)

Just dropping a line that I haven't forgotten about this, but I've had other stuff on my plate recently. I've finally done the long-overdue BBKNN refactor in preparation for this, creating a matrix division like in RBCDE and MultiMAP.

I'll give pynndescent a shot at some point soon and very likely switch over.

Krzysztof Polanski · Answer 4 · Thu Jun 03 2021 01:45:09 GMT+0800 (China Standard Time)

I've finished this up, and it's live in 1.5.0.

Personally, I've found pynndescent run times weirdly slow. The pancreas data is 4x slower than annoy, and even 2x slower than cKDTree - an exact neighbour search algorithm. However, pynndescent does natively support a lot of metrics, and ingests custom functions too, so supporting it increases the package's functionality.