nicklockwood / Euclid

A Swift library for creating and manipulating 3D geometry

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reuse BSP tree info between CSG operations for improved performance

andygeers opened this issue · comments

It seems like quite a bottleneck for complex meshes to keep regenerating brand new BSP trees from scratch for every CSG operation.
What if the “storage” on each Mesh was in fact a BSP- and each operation used the existing BSPs to generate a new one for the resulting Mesh?

If you think there’s any mileage in this approach and can’t see any obvious drawbacks then I might have a go at implementing this. I am having severe performance issues but it feels like I needn’t since I’m just tiling many copies of an otherwise identical mesh, each of which have an identically structured BSP just translated in space, then a few intersect or clip by plane operations.

PS thank you so much for your work on this awesome library- such a blessing that this exists!

I've considered this, and it wouldn't be hard to cache the BSP, but how much reuse would you actually get?

Suppose I subtract object A from object B and get object C. Now I want to subtract object D from the result.

I have BSPs for A and B cached, but that doesn't really help me. Under what circumstances would I want to reuse them?

Edit: ah, I should've read your comment more carefully. So the problem is that to get the benefit in your scenario we would probably need to generate the BSP up-front when the Mesh is first created, otherwise the translated copies wouldn't be able to share the BSP because it wouldn't exist yet.

Generating a BSP up-front is tricky because then you are making users pay the cost when they might not even be planning to use CSG.

The original JS library used BSPs as the Mesh, in much the way you suggest, and it's possible that in your specific scenario that would be an overall performance win.

The reason I moved away from this approach is that creating a BSP for a non-convex mesh involves splitting a bunch of polygons that don't actually intersect anything, they just happen to span the plane of one of the other polys.

That means the resultant shape after CSG has significantly more polygons than actually needed.

This isn't really a problem for convex shapes since their polygon planes don't intersect any of their own polygons, but for that reason building the BSP for those shapes is already fairly cheap.

Storing the BSP as a lazy property of the Storage (rather than as the Storage itself) might potentially work. That way all the copies could share it once generated, even if that happened after they had been copied and transformed.

I think would require storing the transform inside the mesh and applying it lazily rather than immediately though, otherwise transformed meshes couldn't actually share the same BSP.

There are potential advantages to doing that anyway (e.g. improved performance and reducing loss of precision when concatenating transforms), so it's worth exploring.

Another thought: If we could find a better solution for stitching polygons back together after splitting then that would mitigate some of the disadvantages of storing the mesh as a BSP.

Right now the stitching only works if the polygons share two vertices, but often there are cases where they only share edges, which is more expensive to detect and more complex to merge.

Thanks for the feedback. Re:your first question, I guess I was assuming that there might be some speed benefit in creating the resulting BSP if I had the originals to help me- but maybe that is a false assumption!

@andygeers I could be wrong, but my instinct is that merging one BSP into another is the same as merging in the equivalent number of polygons that aren't already in a BSP.

That said, I'm far from an expert on these matters.

Actually.... sorry! I just recompiled my app in Release mode rather than debug and rather than taking minutes it now takes seconds. Shall I close this?

@andygeers leave it open as a reminder - I think there could still be some opportunities for improvements in this area.

Re: generating the BSP up front - I'm not even sure it needs to be done as soon as you initialise the mesh. I load a particular "texture" (e.g. my wall mesh that you have seen) and then use that over and over again on each face of the building. For each face I just tile it a few times, then clip it by four planes (left and right bisectors with the neighbouring walls, and the top and the bottom). So I can easily call a function at the point of loading the mesh to generate the BSP, if that then permitted all of the other operations to be more efficient. But at the moment it has to rebuild the BSP from scratch each time I tile the mesh, then for each of the four clip planes, then again when I export the resulting mesh to an STL file / USDZ / etc. Turns out it's actually fine in Release mode (I know you said this is partly just about how Swift optimises in Release mode, but is it possible that one of the assert calls is checking something super-expensive?) but even so, it feels intuitively like there must be a lot of redundant work going on here

So I've been thinking pretty hard about this... (getting to that stage of my project were improving the performance of all the CSG stuff is increasingly my main priority).

I think for my use case there could be some pretty massive wins here. I'm doing a lot of union operations, which essentially does this:

ap = BSP(mesh).clip(ap, .greaterThan)
bp = BSP(self).clip(bp, .greaterThanEqual)

Now, in my case, mesh is lots of copies of an identical mesh just translated in space, so the BSP could be precalculated once then it would be pretty trivial to implement a BSP.translate method that could reuse the same structure. And self is just the result of the previous union operation - I'm not 100% sure, but I feel like this could reuse the existing BSP, obviously it needs to remove some polygons and then add in some more, but to rebuild the BSP from scratch here feels like it surely isn't necessary.

I'm guessing that CSG operations other than union might not see quite the same benefits, where the results shares less of the BSP structure in common with the mesh that you start with.

Thoughts?

Quick exploratory test (https://github.com/andygeers/Euclid/tree/cached-bsps) suggests that it can be up to 10 times quicker to translate/scale/rotate an existing BSP than to build a new one from scratch. A merge operation is easy to reuse the BSP then merge in the additional polygons - and again, quick tests suggest this can be considerably quicker than building a new one from scratch (exactly how much seems to vary a lot from case to case).

The challenge here is that every CSG operation needs to be handled separately in terms of how to modify the existing BSP. merge was the easy one. I now need to figure out how to do it for subtract, union, intersect and clip... assuming it's even possible in the first place.

Since the various CSG split operations can be performed independently per-polygon, it should be possible to apply them directly to a BSP rather than a flat array of polygons. Not sure about the peformance implications.

My concern about this approach (storing meshes in BSP form) is mainly the problem of generating many more polygons than needed. I'm still missing a good way to stitch polygons back together after multiple splits, which means that storing them in a BSP all the time results in way more polys than is desirable.

I realize now though that the flaw in my previous approaches may have been that I was ignoring the BSP structure. If I use the BSP itself to guide the merge (by walking backwards from the leaf nodes and looking only at polygons that touch the plane on either side of a split) then there is potential to improve both the speed and completeness of the merging process when converting back to a mesh.

In some ways I don't much mind about the "way more polys than is desirable" issue - since I was just focussed on speeding up the CSG process, and that always involves creating a BSP anyway. So far the approach I've been testing still maintains the original array of polygons AND a cached BSP. Admittedly it would be more memory efficient if the BSP could become the "storage" - at which point you do want to make sure you're not storing lots and lots of unnecessarily split polygons - but if raw speed is the main concern then it's not strictly necessary to solve that problem. I think?

@andygeers I think when I (eventually) get around to this refactor, I'd like to try and solve it properly, and maintaining two sets of polygons doesn't seem like the right direction to go, but I'll consider it if merging the BSP proves untenable.

ps. Thanks for the generous donation! And sorry again for the lack of updates, I hope to get back to this once work quiets down a bit.

As a simpler scenario than implementing all of this... Here's my current performance bottleneck: I take a wall mesh, and I clip it by four or more planes (currently all my walls are rectangular, so it's top, bottom, left and right planes - where left and right will not be parallel to each other because of bisecting between neighbouring walls, but in the near future my walls will not be rectangular so it might be at least five planes, yielding a "pointy" top).

I use the mesh.clip(by plane) method with fill, and constructing the BSP for each one takes an absolute age, and then it throws this away and to clip by the next plane it constructs a totally new BSP from scratch which takes another age... Any tips on how I could either:

  1. Change the implementation of the fill part of the "clip by plane" so that it could avoid generating the full BSP,

and/or

  1. Create a BSP for the result of the "clip and fill" more efficiently, using the fact that we know that all of the resulting polygons are one side of a specific plane? (in my https://github.com/andygeers/Euclid/tree/cached-bsps branch I've currently only figured out how to maintain previous BSPs across merge operations, but I feel like there must be a relatively easy way to do it for these clip ones too if I were smarter)

Potentially it feels like I could stop clipping by individual planes and instead somehow generate a single 3D space as the intersection of all of these planes and then do one intersect operation - perhaps with my cached-bsps branch that would be extremely efficient. I'm not entirely sure how to construct that Mesh directly from the planes however...

OK, so I did get it working using an intersect rather than clip by plane... Now I just need to progressively build up my BSP during the tiling process. Currently I merge each tile of the wall mesh, and at that point bounds.min.x of the next tile being merged equls bounds.max.x of the wall so far, so it's easy enough to identify this situation. I want to create a new BSPNode using the YZ plane between the wall so far and the next tile. It feels like it ought to be easy enough, where 'above' is a clone of the wall's root BSP node and 'below' is a clone of the next tile's one. But I think it could get a bit fiddly if there are other nodes at arbitrary points in the 'below' tree with that same plane.

I'm kind of tempted to move the concept of 'tiling' out of my app and into my branch of Euclid itself, since it feels like there's lots of special casing that can be done for this specific scenario that's easier than handling the general case. e.g. there's the issue of removing all of the 'internal' faces that result from the tiling process. It would either have to be that, or making the BSP stuff more public rather than entirely private as it is at the moment.

So a follow up on this after some more experimentation... I don't actually think there is much/any benefit to be had by progressively building the BSP as you go whilst doing successive merge operations. BUT I think there is significant benefit to be had by making a mesh's BSP something that can be cached after it has been generated a single time - if I rewrite my code to build one very large wall panel (the size of the largest wall in my building), calculate its BSP once, then just reuse that over and over for the various intersect calls for successive wall panels of various shapes and sizes, then I believe that would be much quicker (especially if I can devise a way to avoid actually physically translating the wall panel over and over during the tiling process, which I've discovered can itself be pretty slow, but just send a list of offsets whilst building the BSP).

This is also much nicer from the point of view of what you were concerned about, because it avoids having to build the BSP for everybody whether they need it or not.

@andygeers doesn't that have the problem (in the general case) that you'll be modifying the wal panel each time you intersect, so you'd need to recalculate the BSP?

In your particular case the meshes you are intersecting with the wall (doors, windows, etc) probably don't intersect each other, so you can use the same wall panel BSP to perform the intersection test with each of them, but I'm not sure how to generalize that.

Hmm, yes you're right - I was just thinking about the simple 'create a blank wall' stage which I've been testing against (which is lots of merge operations to tile, then a single intersect operation to clip to a wedge shape), sort of forgot about all of those subsequent subtract and union operations to add all of the windows/doors/etc...

(but my point is that each subsequent wall could start with the same BSP, before that intersect operation clips it to a wedge)

@andygeers I'm currently revisiting possible performance improvements in Euclid. Are you still using it in your app?

Hi Nick, yes I do, although I’ve barely touched the app since last summer. But happy to test things out for you if it would be useful (significant performance improvements might help me get going again!)

@andygeers Great! Yeah, I also haven't really touched ShapeScript much in the last years or so, but I'm excited to get back to it.

I have already made some fairly significant performance improvements anyway since we last discussed this - in particular Euclid now has optimized methods for applying multiple CSG operations at once that you may find give you a boost, but there's definitely still room for improvement (e.g. by utilizing multithreading), and I'm still considering some sort of BSP caching as we discussed before.

@andygeers just an update on this - I tried caching BSP for reuse as part of a raft of performance optimizations I made for 0.6.0 and saw no benefit for any of my test models. If you've got any examples of code where you still believe it would help I'd love to see them.