VpTree does not accept float in knncolle

Question

VpTree does not accept float in knncolle

shrit opened this issue 7 months ago · comments

Omar Shrit commented 7 months ago

Hi Aaron,

It seems that VpTree does not accept float value as the std::tuple is hard coded to double.

Here is the code:

https://github.com/LTLA/knncolle/blob/3ad6b8cdbd281d78c77390d5a6ded4513bdf3860/include/knncolle/VpTree/VpTree.hpp#L76

Here is the error when trying to use float for Umap:

/./knncolle/VpTree/VpTree.hpp:150:29: error: no matching function for call to ‘std::tuple<int, const double*, double>::tuple(int&, const float*, int)’
  150 |             items.push_back(DataPoint(i, vals + i * num_dim, 0));

It seems that const INTERNAL_t*, is not being propagated correctly from umap, and it is using the default value causing the above error.

I know this is belong to the knn repo, but it is here when the error is happening.

Any easy solution ?

Many thanks

Aaron Lun · Answer 1 · Tue Dec 19 2023 15:36:56 GMT+0800 (China Standard Time)

Hopefully fixed by ff42321. Note that VpTree is a decent default for small datasets but you'll probably want to use one of the approximate methods (Annoy or HNSW) for anything larger.

Omar Shrit · Answer 2 · Tue Dec 19 2023 22:49:45 GMT+0800 (China Standard Time)

@LTLA Thank you very much for the quick fix, do you know how I can specify Annoy as a template parameter for Umap ? is there an example that shows how this is done ?
Many thanks

Aaron Lun · Answer 3 · Wed Dec 20 2023 14:37:24 GMT+0800 (China Standard Time)

The README has an example:

umappp::Umap x;
knncolle::AnnoyEuclidean<> searcher(ndim, nobs, data.data());
x.run(&searcher, 2, embedding.data());

You might need to add <float> in various places above.

TBH I don't usually use this method signature for run(), as I compute the neighbor list manually and provide it to Umap::run(). This is because I might end up re-using the neighbor list with different UMAP parameters so I just compute the same set of neighbors once and use it across different UMAP runs.

If you want some more inspiration, check out these bindings:

R: https://github.com/LTLA/scran.chan/blob/master/src/run_umap.cpp#L12
Javascript (via Wasm): https://github.com/kanaverse/scran.js/blob/master/src/run_umap.cpp

Omar Shrit · Answer 4 · Tue Jan 09 2024 06:29:18 GMT+0800 (China Standard Time)

Perfect, thank you very much, I will keep this open for a while until I have the chance to run it again and let you know if I have any question