Performant way to estimate distances for bulk `inrange` search?
Datseris opened this issue · comments
Hi there, I'm doing standard bulk in-range searches via the syntax vec_of_idxs = NearestNeighbors.inrange(tree, queries, r)
. My data are SVector{D, Float64}
(but I'm not sure whether this matters.
At the moment I need the distances of the found neighbors from the queries, even more than the indices. I've noticed that while the above call is spectacularly fast, the code I wrote to get the distances is veeeery slow:
function _NN_get_ds(tree::KDTree, query, idxs)
if tree.reordered
ds = [
evaluate(tree.metric, query, tree.data[
findfirst(isequal(i), tree.indices)
]) for i in idxs]
else
ds = [evaluate(tree.metric, query, tree.data[i]) for i in idxs]
end
end
and now I transform my original code as
vec_of_idxs = NearestNeighbors.inrange(tree, queries, r)
vec_of_ds = [ _NN_get_ds(tree, queries[j], vec_of_idxs[j]) for j in 1:length(queries)]
I'm wonder, whether something already exists in this library, that provides these distances in a faster way?
PS: After doing some testing, the bottleneck is in fact the weird clause I've written if tree.reordered
. I obviously shouldn't be using findfirst
in such an inner loop... What is the correct way to translate tree.indices
to data indices when the tree is reordered
?
I guess you could create the inverse lookup (for all points) a single time and then reuse that.
I guess that makes sense. But how does the tree itself know the correct indices? I mean, for the "skip" predicate version of e.g. knn
, the tree needs to somehow know the correct indices as well, no? Can't I obtain them the same way the tree obtains them when it checks to skip
or not?
It just does this
NearestNeighbors.jl/src/tree_ops.jl
Line 97 in 2efd998