mhahsler / dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dbscan to cull one spatial dataset based on another?

LovellHAGSC opened this issue · comments

Hi all,
Thank you for your work on dbscan. It is a great resource. This is not an issue, but a request for advice.

Here's my situation:
I have two sets of xy coordinates, each on the same scale. One is very noisy, the other contains known very high-confidence data. I want to cull the first (noisy) dataset to only points within an epsilon radius of any points in the second (high-confidence) dataset.

For example, see the graphic at the bottom of this post. Here, I have applied a buffer (blue polygon) around the high-confidence points (the high-confidence points are not shown). All black points come from the low-confidence dataset. In this case, I would want to retain any points within the blue buffer.

I have an R script that does this, but it is SLOW. There are also some GIS packages that can do something similar (e.g. rgeos::gBuffer), but these require a bunch of dependencies that I would prefer to avoid. I was thinking that frNN and dbscan could be coerced to accomplish this task, but I wasn't sure.

Any advice is much appreciated.
Thanks,
John

image

I think frNN can do this. Append the rbind the high confidence data and the low confidence data and then use frNN to find the points that are neighbors to the high confidence points. The k-d tree used for frNN should speed things up for you.

Yes! This worked perfectly. So easy.
Thanks!
John

I am glad it worked for you.

In case someone comes across this post, the frNN approach is >10x faster than what I had implemented in rgeos::gBuffer.
Thanks again.