mhahsler / dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LOF edge case

sverchkov opened this issue · comments

I noticed an edge case in LOF:

> dbscan::lof(dist(c(1,2,3,4,5,6,7)), k=3)
[1] 1.0555556 1.0555556 1.0555556 0.9047619 0.9047619 1.1111111 1.1111111

By symmetry, points 1, 2, 3 should respectively have the same LOFs as points 7, 6, 5. According to my own calculation the answer should be:

[1] 1.0679012 1.0679012 1.0133929 0.8730159 1.0133929 1.0679012 1.0679012

I think this is happening because when determining the nearest neighbors, the code uses kNN which selects exactly k points, but in the LOF calculation the neighborhood is supposed to include all points that are as close as the k-th nearest neighbor, which in the case of ties (like here) can include more than k points.

Thank you for the bug report. This creates an issue with the used ANN library. I need to check if it is possible to solve this issue without a big hit on performance.

This bug is now fixed in the latest version on GitHub. Please check and reopen this issue if you find issues.