codeandcats / KdTree

A fast, generic, multi-dimensional Binary Search Tree written in C#

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Item not found

SebastianStehle opened this issue · comments

Hi,
I have a dataset of 7600 items and I make a query to get the 40 nearest members but one of the expected results is missing. Is there any known bug? If you have time to have a look I can provide the dataset.

Sebastian

commented

Hi Sebastian,

If you shoot me your dataset I'll check it out. I'll email you shortly.

Cheers,
Ben

@SebastianStehle Not to be a meddler: I would start with trimming the dataset to, say, 41 or 45 items (all 40 expected ones and then one or a few more) and check to see if it still doesn't return the expected items. If the problem is still present the dataset would at least be small enough to be posted here so others could have a crack at it too. You could even probably make a testcase with maybe 3 or 10 items that reproduces and demonstrates the same problem and then post it here.

Don't get me wrong; I think it's awesome @codeandcats is willing to have a look at your data. All I'm trying to point out is that if you'd be able to post the data here as a 'testcase' (unittest if you will), as opposed to exchanging stuff via e-mail, others (like me) could also have a look at it. That way not all the weight would be on @codeandcats' shoulders 😉 Yay Open Source! 😛 Also, other people possibly experiencing the same problem might pitch in when the see your data/testcase and may contribute more alike examples or maybe even a solution / PR.

commented

Hehe thanks @RobThree, I appreciate and agree with your sentiments though I had some free time tonight so didn't mind having a quick look. :)

I wrote a short LinqPad query simply ordering the dataset by distance and it seems that the item in question isn't within the nearest 50 of the specified point. Just getting @SebastianStehle to review my code and confirm now..

commented

Update:

I wrote a small script to test using simple math that the id was not in the nearest 50. However Sebastian was getting different results running the script. It seems float.Parse was interpretating commas as thousand separators/decimal points differently between us. Once I fixed that bug in the script, it does appear that the item is within the nearest 50 using simple math, but isn't according to the tree.

I've asked Sebastian to add a unit test and send me a pull request so we can get more eyes across this. :)

commented

Closing this issue. It turned out that the data contained duplicate coordinates, so the point in question simply didn't get added to the tree.
I've created a new enhancement allowing duplicates within the tree.