LTLA / umappp

UMAP C++ implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VpTree does not accept float in knncolle

shrit opened this issue · comments

Hi Aaron,

It seems that VpTree does not accept float value as the std::tuple is hard coded to double.

Here is the code:

https://github.com/LTLA/knncolle/blob/3ad6b8cdbd281d78c77390d5a6ded4513bdf3860/include/knncolle/VpTree/VpTree.hpp#L76

Here is the error when trying to use float for Umap:

/./knncolle/VpTree/VpTree.hpp:150:29: error: no matching function for call to ‘std::tuple<int, const double*, double>::tuple(int&, const float*, int)’
  150 |             items.push_back(DataPoint(i, vals + i * num_dim, 0));

It seems that const INTERNAL_t*, is not being propagated correctly from umap, and it is using the default value causing the above error.

I know this is belong to the knn repo, but it is here when the error is happening.

Any easy solution ?

Many thanks

Hopefully fixed by ff42321. Note that VpTree is a decent default for small datasets but you'll probably want to use one of the approximate methods (Annoy or HNSW) for anything larger.

@LTLA Thank you very much for the quick fix, do you know how I can specify Annoy as a template parameter for Umap ? is there an example that shows how this is done ?
Many thanks

The README has an example:

umappp::Umap x;
knncolle::AnnoyEuclidean<> searcher(ndim, nobs, data.data());
x.run(&searcher, 2, embedding.data());

You might need to add <float> in various places above.

TBH I don't usually use this method signature for run(), as I compute the neighbor list manually and provide it to Umap::run(). This is because I might end up re-using the neighbor list with different UMAP parameters so I just compute the same set of neighbors once and use it across different UMAP runs.

If you want some more inspiration, check out these bindings:

Perfect, thank you very much, I will keep this open for a while until I have the chance to run it again and let you know if I have any question