Reuse BSP tree info between CSG operations for improved performance

Question

Reuse BSP tree info between CSG operations for improved performance

andygeers opened this issue 4 years ago · comments

It seems like quite a bottleneck for complex meshes to keep regenerating brand new BSP trees from scratch for every CSG operation.
What if the “storage” on each Mesh was in fact a BSP- and each operation used the existing BSPs to generate a new one for the resulting Mesh?

If you think there’s any mileage in this approach and can’t see any obvious drawbacks then I might have a go at implementing this. I am having severe performance issues but it feels like I needn’t since I’m just tiling many copies of an otherwise identical mesh, each of which have an identically structured BSP just translated in space, then a few intersect or clip by plane operations.

PS thank you so much for your work on this awesome library- such a blessing that this exists!

Nick Lockwood · Answer 1 · Fri Feb 07 2020 02:29:16 GMT+0800 (China Standard Time)

I've considered this, and it wouldn't be hard to cache the BSP, but how much reuse would you actually get?

Suppose I subtract object A from object B and get object C. Now I want to subtract object D from the result.

I have BSPs for A and B cached, but that doesn't really help me. Under what circumstances would I want to reuse them?

Edit: ah, I should've read your comment more carefully. So the problem is that to get the benefit in your scenario we would probably need to generate the BSP up-front when the Mesh is first created, otherwise the translated copies wouldn't be able to share the BSP because it wouldn't exist yet.

Nick Lockwood · Answer 2 · Fri Feb 07 2020 02:33:23 GMT+0800 (China Standard Time)

Generating a BSP up-front is tricky because then you are making users pay the cost when they might not even be planning to use CSG.

Nick Lockwood · Answer 3 · Fri Feb 07 2020 02:36:57 GMT+0800 (China Standard Time)

The original JS library used BSPs as the Mesh, in much the way you suggest, and it's possible that in your specific scenario that would be an overall performance win.

The reason I moved away from this approach is that creating a BSP for a non-convex mesh involves splitting a bunch of polygons that don't actually intersect anything, they just happen to span the plane of one of the other polys.

That means the resultant shape after CSG has significantly more polygons than actually needed.

Nick Lockwood · Answer 4 · Fri Feb 07 2020 02:38:51 GMT+0800 (China Standard Time)

This isn't really a problem for convex shapes since their polygon planes don't intersect any of their own polygons, but for that reason building the BSP for those shapes is already fairly cheap.

Nick Lockwood · Answer 5 · Fri Feb 07 2020 02:48:37 GMT+0800 (China Standard Time)

Storing the BSP as a lazy property of the Storage (rather than as the Storage itself) might potentially work. That way all the copies could share it once generated, even if that happened after they had been copied and transformed.

I think would require storing the transform inside the mesh and applying it lazily rather than immediately though, otherwise transformed meshes couldn't actually share the same BSP.

There are potential advantages to doing that anyway (e.g. improved performance and reducing loss of precision when concatenating transforms), so it's worth exploring.

Nick Lockwood · Answer 6 · Fri Feb 07 2020 02:54:18 GMT+0800 (China Standard Time)

Another thought: If we could find a better solution for stitching polygons back together after splitting then that would mitigate some of the disadvantages of storing the mesh as a BSP.

Right now the stitching only works if the polygons share two vertices, but often there are cases where they only share edges, which is more expensive to detect and more complex to merge.

Andy Geers · Answer 7 · Fri Feb 07 2020 03:04:28 GMT+0800 (China Standard Time)

Thanks for the feedback. Re:your first question, I guess I was assuming that there might be some speed benefit in creating the resulting BSP if I had the originals to help me- but maybe that is a false assumption!

Nick Lockwood · Answer 8 · Fri Feb 07 2020 03:15:47 GMT+0800 (China Standard Time)

@andygeers I could be wrong, but my instinct is that merging one BSP into another is the same as merging in the equivalent number of polygons that aren't already in a BSP.

That said, I'm far from an expert on these matters.

Andy Geers · Answer 9 · Fri Feb 07 2020 16:03:43 GMT+0800 (China Standard Time)

Actually.... sorry! I just recompiled my app in Release mode rather than debug and rather than taking minutes it now takes seconds. Shall I close this?

Nick Lockwood · Answer 10 · Fri Feb 07 2020 22:01:31 GMT+0800 (China Standard Time)

@andygeers leave it open as a reminder - I think there could still be some opportunities for improvements in this area.

Andy Geers · Answer 11 · Fri Feb 07 2020 22:33:28 GMT+0800 (China Standard Time)

Re: generating the BSP up front - I'm not even sure it needs to be done as soon as you initialise the mesh. I load a particular "texture" (e.g. my wall mesh that you have seen) and then use that over and over again on each face of the building. For each face I just tile it a few times, then clip it by four planes (left and right bisectors with the neighbouring walls, and the top and the bottom). So I can easily call a function at the point of loading the mesh to generate the BSP, if that then permitted all of the other operations to be more efficient. But at the moment it has to rebuild the BSP from scratch each time I tile the mesh, then for each of the four clip planes, then again when I export the resulting mesh to an STL file / USDZ / etc. Turns out it's actually fine in Release mode (I know you said this is partly just about how Swift optimises in Release mode, but is it possible that one of the assert calls is checking something super-expensive?) but even so, it feels intuitively like there must be a lot of redundant work going on here

Andy Geers · Answer 12 · Sat Mar 14 2020 01:28:18 GMT+0800 (China Standard Time)

So I've been thinking pretty hard about this... (getting to that stage of my project were improving the performance of all the CSG stuff is increasingly my main priority).

I think for my use case there could be some pretty massive wins here. I'm doing a lot of union operations, which essentially does this:

ap = BSP(mesh).clip(ap, .greaterThan)
bp = BSP(self).clip(bp, .greaterThanEqual)

Now, in my case, mesh is lots of copies of an identical mesh just translated in space, so the BSP could be precalculated once then it would be pretty trivial to implement a BSP.translate method that could reuse the same structure. And self is just the result of the previous union operation - I'm not 100% sure, but I feel like this could reuse the existing BSP, obviously it needs to remove some polygons and then add in some more, but to rebuild the BSP from scratch here feels like it surely isn't necessary.

I'm guessing that CSG operations other than union might not see quite the same benefits, where the results shares less of the BSP structure in common with the mesh that you start with.

Thoughts?

Andy Geers · Answer 13 · Sun Mar 15 2020 01:15:15 GMT+0800 (China Standard Time)

Quick exploratory test (https://github.com/andygeers/Euclid/tree/cached-bsps) suggests that it can be up to 10 times quicker to translate/scale/rotate an existing BSP than to build a new one from scratch. A merge operation is easy to reuse the BSP then merge in the additional polygons - and again, quick tests suggest this can be considerably quicker than building a new one from scratch (exactly how much seems to vary a lot from case to case).

The challenge here is that every CSG operation needs to be handled separately in terms of how to modify the existing BSP. merge was the easy one. I now need to figure out how to do it for subtract, union, intersect and clip... assuming it's even possible in the first place.

Nick Lockwood · Answer 14 · Wed Mar 18 2020 05:39:02 GMT+0800 (China Standard Time)

Since the various CSG split operations can be performed independently per-polygon, it should be possible to apply them directly to a BSP rather than a flat array of polygons. Not sure about the peformance implications.

My concern about this approach (storing meshes in BSP form) is mainly the problem of generating many more polygons than needed. I'm still missing a good way to stitch polygons back together after multiple splits, which means that storing them in a BSP all the time results in way more polys than is desirable.

I realize now though that the flaw in my previous approaches may have been that I was ignoring the BSP structure. If I use the BSP itself to guide the merge (by walking backwards from the leaf nodes and looking only at polygons that touch the plane on either side of a split) then there is potential to improve both the speed and completeness of the merging process when converting back to a mesh.

Andy Geers · Answer 15 · Wed Mar 18 2020 19:25:24 GMT+0800 (China Standard Time)

In some ways I don't much mind about the "way more polys than is desirable" issue - since I was just focussed on speeding up the CSG process, and that always involves creating a BSP anyway. So far the approach I've been testing still maintains the original array of polygons AND a cached BSP. Admittedly it would be more memory efficient if the BSP could become the "storage" - at which point you do want to make sure you're not storing lots and lots of unnecessarily split polygons - but if raw speed is the main concern then it's not strictly necessary to solve that problem. I think?

Nick Lockwood · Answer 16 · Thu Apr 09 2020 20:14:56 GMT+0800 (China Standard Time)

@andygeers I think when I (eventually) get around to this refactor, I'd like to try and solve it properly, and maintaining two sets of polygons doesn't seem like the right direction to go, but I'll consider it if merging the BSP proves untenable.

ps. Thanks for the generous donation! And sorry again for the lack of updates, I hope to get back to this once work quiets down a bit.

Andy Geers · Answer 17 · Tue Apr 21 2020 00:33:56 GMT+0800 (China Standard Time)

As a simpler scenario than implementing all of this... Here's my current performance bottleneck: I take a wall mesh, and I clip it by four or more planes (currently all my walls are rectangular, so it's top, bottom, left and right planes - where left and right will not be parallel to each other because of bisecting between neighbouring walls, but in the near future my walls will not be rectangular so it might be at least five planes, yielding a "pointy" top).

I use the mesh.clip(by plane) method with fill, and constructing the BSP for each one takes an absolute age, and then it throws this away and to clip by the next plane it constructs a totally new BSP from scratch which takes another age... Any tips on how I could either:

Change the implementation of the fill part of the "clip by plane" so that it could avoid generating the full BSP,

and/or

Create a BSP for the result of the "clip and fill" more efficiently, using the fact that we know that all of the resulting polygons are one side of a specific plane? (in my https://github.com/andygeers/Euclid/tree/cached-bsps branch I've currently only figured out how to maintain previous BSPs across merge operations, but I feel like there must be a relatively easy way to do it for these clip ones too if I were smarter)

Potentially it feels like I could stop clipping by individual planes and instead somehow generate a single 3D space as the intersection of all of these planes and then do one intersect operation - perhaps with my cached-bsps branch that would be extremely efficient. I'm not entirely sure how to construct that Mesh directly from the planes however...

Andy Geers · Answer 18 · Tue Apr 21 2020 16:51:05 GMT+0800 (China Standard Time)

OK, so I did get it working using an intersect rather than clip by plane... Now I just need to progressively build up my BSP during the tiling process. Currently I merge each tile of the wall mesh, and at that point bounds.min.x of the next tile being merged equls bounds.max.x of the wall so far, so it's easy enough to identify this situation. I want to create a new BSPNode using the YZ plane between the wall so far and the next tile. It feels like it ought to be easy enough, where 'above' is a clone of the wall's root BSP node and 'below' is a clone of the next tile's one. But I think it could get a bit fiddly if there are other nodes at arbitrary points in the 'below' tree with that same plane.

I'm kind of tempted to move the concept of 'tiling' out of my app and into my branch of Euclid itself, since it feels like there's lots of special casing that can be done for this specific scenario that's easier than handling the general case. e.g. there's the issue of removing all of the 'internal' faces that result from the tiling process. It would either have to be that, or making the BSP stuff more public rather than entirely private as it is at the moment.

Andy Geers · Answer 19 · Wed May 06 2020 17:20:12 GMT+0800 (China Standard Time)

So a follow up on this after some more experimentation... I don't actually think there is much/any benefit to be had by progressively building the BSP as you go whilst doing successive merge operations. BUT I think there is significant benefit to be had by making a mesh's BSP something that can be cached after it has been generated a single time - if I rewrite my code to build one very large wall panel (the size of the largest wall in my building), calculate its BSP once, then just reuse that over and over for the various intersect calls for successive wall panels of various shapes and sizes, then I believe that would be much quicker (especially if I can devise a way to avoid actually physically translating the wall panel over and over during the tiling process, which I've discovered can itself be pretty slow, but just send a list of offsets whilst building the BSP).

This is also much nicer from the point of view of what you were concerned about, because it avoids having to build the BSP for everybody whether they need it or not.

Nick Lockwood · Answer 20 · Wed May 06 2020 17:34:08 GMT+0800 (China Standard Time)

@andygeers doesn't that have the problem (in the general case) that you'll be modifying the wal panel each time you intersect, so you'd need to recalculate the BSP?

In your particular case the meshes you are intersecting with the wall (doors, windows, etc) probably don't intersect each other, so you can use the same wall panel BSP to perform the intersection test with each of them, but I'm not sure how to generalize that.

Andy Geers · Answer 21 · Wed May 06 2020 17:43:29 GMT+0800 (China Standard Time)

Hmm, yes you're right - I was just thinking about the simple 'create a blank wall' stage which I've been testing against (which is lots of merge operations to tile, then a single intersect operation to clip to a wedge shape), sort of forgot about all of those subsequent subtract and union operations to add all of the windows/doors/etc...

Andy Geers · Answer 22 · Wed May 06 2020 17:45:25 GMT+0800 (China Standard Time)

(but my point is that each subsequent wall could start with the same BSP, before that intersect operation clips it to a wedge)

Nick Lockwood · Answer 23 · Mon Aug 09 2021 16:43:43 GMT+0800 (China Standard Time)

@andygeers I'm currently revisiting possible performance improvements in Euclid. Are you still using it in your app?

Andy Geers · Answer 24 · Mon Aug 09 2021 16:47:46 GMT+0800 (China Standard Time)

Hi Nick, yes I do, although I’ve barely touched the app since last summer. But happy to test things out for you if it would be useful (significant performance improvements might help me get going again!)

Nick Lockwood · Answer 25 · Mon Aug 09 2021 17:06:43 GMT+0800 (China Standard Time)

@andygeers Great! Yeah, I also haven't really touched ShapeScript much in the last years or so, but I'm excited to get back to it.

I have already made some fairly significant performance improvements anyway since we last discussed this - in particular Euclid now has optimized methods for applying multiple CSG operations at once that you may find give you a boost, but there's definitely still room for improvement (e.g. by utilizing multithreading), and I'm still considering some sort of BSP caching as we discussed before.

Nick Lockwood · Answer 26 · Wed Oct 26 2022 14:32:38 GMT+0800 (China Standard Time)

@andygeers just an update on this - I tried caching BSP for reuse as part of a raft of performance optimizations I made for 0.6.0 and saw no benefit for any of my test models. If you've got any examples of code where you still believe it would help I'd love to see them.