dbscan to cull one spatial dataset based on another?

Question

dbscan to cull one spatial dataset based on another?

LovellHAGSC opened this issue 6 years ago · comments

Hi all,
Thank you for your work on dbscan. It is a great resource. This is not an issue, but a request for advice.

Here's my situation:
I have two sets of xy coordinates, each on the same scale. One is very noisy, the other contains known very high-confidence data. I want to cull the first (noisy) dataset to only points within an epsilon radius of any points in the second (high-confidence) dataset.

For example, see the graphic at the bottom of this post. Here, I have applied a buffer (blue polygon) around the high-confidence points (the high-confidence points are not shown). All black points come from the low-confidence dataset. In this case, I would want to retain any points within the blue buffer.

I have an R script that does this, but it is SLOW. There are also some GIS packages that can do something similar (e.g. rgeos::gBuffer), but these require a bunch of dependencies that I would prefer to avoid. I was thinking that frNN and dbscan could be coerced to accomplish this task, but I wasn't sure.

Any advice is much appreciated.
Thanks,
John

Michael Hahsler · Answer 1 · Thu Feb 14 2019 07:31:40 GMT+0800 (China Standard Time)

I think frNN can do this. Append the rbind the high confidence data and the low confidence data and then use frNN to find the points that are neighbors to the high confidence points. The k-d tree used for frNN should speed things up for you.

LovellHAGSC · Answer 2 · Thu Feb 14 2019 11:10:23 GMT+0800 (China Standard Time)

Yes! This worked perfectly. So easy.
Thanks!
John

Michael Hahsler · Answer 3 · Thu Feb 14 2019 23:48:06 GMT+0800 (China Standard Time)

I am glad it worked for you.

LovellHAGSC · Answer 4 · Fri Feb 15 2019 00:57:19 GMT+0800 (China Standard Time)

In case someone comes across this post, the frNN approach is >10x faster than what I had implemented in rgeos::gBuffer.
Thanks again.