KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performant way to estimate distances for bulk `inrange` search?

Datseris opened this issue · comments

Hi there, I'm doing standard bulk in-range searches via the syntax vec_of_idxs = NearestNeighbors.inrange(tree, queries, r). My data are SVector{D, Float64} (but I'm not sure whether this matters.

At the moment I need the distances of the found neighbors from the queries, even more than the indices. I've noticed that while the above call is spectacularly fast, the code I wrote to get the distances is veeeery slow:

function _NN_get_ds(tree::KDTree, query, idxs)
    if tree.reordered
        ds = [
            evaluate(tree.metric, query, tree.data[
                findfirst(isequal(i), tree.indices)
            ]) for i in idxs]
    else
        ds = [evaluate(tree.metric, query, tree.data[i]) for i in idxs]
    end
end

and now I transform my original code as

    vec_of_idxs = NearestNeighbors.inrange(tree, queries, r)
    vec_of_ds = [ _NN_get_ds(tree, queries[j], vec_of_idxs[j]) for j in 1:length(queries)]

I'm wonder, whether something already exists in this library, that provides these distances in a faster way?

PS: After doing some testing, the bottleneck is in fact the weird clause I've written if tree.reordered. I obviously shouldn't be using findfirst in such an inner loop... What is the correct way to translate tree.indices to data indices when the tree is reordered?

I guess you could create the inverse lookup (for all points) a single time and then reuse that.

I guess that makes sense. But how does the tree itself know the correct indices? I mean, for the "skip" predicate version of e.g. knn, the tree needs to somehow know the correct indices as well, no? Can't I obtain them the same way the tree obtains them when it checks to skip or not?

It just does this

idx = tree.reordered ? z : tree.indices[z]