considering using `pynndescent` for appox nn search?
giovp opened this issue · comments
no more annoy
dependency, pynndescent
is also default for umap, and it performs much better
This is a pretty good idea actually, thanks for the suggestion - I've been annoyed by annoy for a while as the most recent version throws segfaults all over the place. This also resolves the whole angular
metric discrepancy between annoy and UMAP. Need to see how easy this is to install and how efficient index construction is (I remember some of the advertised fast query approaches spending forever crafting theirs, negating the point), but I'm hopeful!
What do you mean by this being a UMAP default by the way?
right, had those segfaults issues just recently 😅
What do you mean by this being a UMAP default by the way?
since umap 0.5 pynndescent
is a hard dependency
https://umap-learn.readthedocs.io/en/latest/release_notes.html#what-s-new-in-0-5
Just dropping a line that I haven't forgotten about this, but I've had other stuff on my plate recently. I've finally done the long-overdue BBKNN refactor in preparation for this, creating a matrix
division like in RBCDE and MultiMAP.
I'll give pynndescent a shot at some point soon and very likely switch over.
I've finished this up, and it's live in 1.5.0.
Personally, I've found pynndescent run times weirdly slow. The pancreas data is 4x slower than annoy, and even 2x slower than cKDTree - an exact neighbour search algorithm. However, pynndescent does natively support a lot of metrics, and ingests custom functions too, so supporting it increases the package's functionality.