Better partitioning in the bulk loading algorithm
mourner opened this issue · comments
Currently, the bulk loading algorithm partitions each node into approximately sqrt(N) x sqrt(N)
child nodes. This becomes a problem if a node is not a perfect square — child nodes will get narrower the deeper you go. I noticed this problem when looking at the viz for a rectangular data space:
Notice the very narrow rectangles at the bottom. We could fix this by designing an algorithm that picks a K x M
partitioning that takes the aspect ratio of a node into account, to make child nodes approach square shape no matter how narrow they are. This should make query performance on bulk-loaded trees better.
cc @danpat
Were your improvements here merged into master?
No — the approach from above was flawed (making bulk-load performance worse) and I never figured out how to go around that. Maybe I'll try again some time.
Pushed the work-in-progress code I had to a7047e9 — feel free to poke around this. As far as I recall now, there were two issues:
- Despite the tree looking much better visually, I couldn't get a meaningful search query improvement in benchmarks. Maybe I measured wrong though.
- I didn't like having to recalculate the bounding box for all items on each iteration, this didn't feel right, although I never found an alternative.