PixarAnimationStudios / OpenSubdiv

An Open-Source subdivision surface library.

Home Page:graphics.pixar.com/opensubdiv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implicit type narrowing causes crashes with large geometries

jmather-sesi opened this issue · comments

There is a common pattern in OpenSubdiv where integers of smaller types are passed into the resize method of std::vectors. When the incoming meshes are very large, these types overflow and can cause a crash. We are encountering this issue when loading the Moana island dataset into Solaris on Apple silicon.

For instance:
quadRefinement.cpp:108: _child->_faceVertIndices.resize(_child->getNumFaces() * 4);

getNumFaces returns a 32 bit integer, while resize accepts a size_t. In our crash, getNumFaces returns 625186471 and is multiplied by 4 as a 32 bit integer before being implicitly casted to size_t. This results in the integer overflow.

There is also this pattern, which would also need to be addressed:

    int childEdgeFaceIndexSizeEstimate = (int)_parent->_faceVertIndices.size() * 2 +
                                         (int)_parent->_edgeFaceIndices.size() * 2;

    _child->_edgeFaceCountsAndOffsets.resize(_child->getNumEdges() * 2);
    _child->_edgeFaceIndices.resize(     childEdgeFaceIndexSizeEstimate);

The size_t result from _faceVertIndices.size() is narrowed to an integer and then multiplied, which can cause a multiplicative overflow, and is then added to another narrowed result, which could result in an additional overflow. The result of childEdgeFaceIndexSizeEstimate is then multiplied as a 32-bit integer before being casted back to a size_t.

I can see two ways in which this could be resolved. One would be to explicitly cast each call to getNumFaces() and similar methods to size_t before the multiplication. The second way, which would result in less code modification, would be to change the return types of these getNum methods to size_t, however this would change ABI.

FYI: clang-tidy reports these issues as "bugprone-implicit-widening-of-multiplication-result".

Filed as internal issue #OSD-405

The specific issue identified here is part of a much wider limitation of OpenSubdiv which, unfortunately, is not likely to be addressed soon.

The bigger issue here is that array sizes and integer elements for the many intermediate data structures are limited to 32 bits. That reflects the origins of OpenSubdiv (Hbr and tables targeted to GPU usage) but has also been a conscious choice in ongoing development. And unfortunately, detection of integer overflow and associated error reporting has not been well addressed.

A point missing from the proposed solution for this particular case is that when an array size N > MAXINT occurs, not only is a 64-bit integer required for the size, but all indices < N into that array must also be 64-bit. Since many arrays contain indices into other arrays (vertices of each face, vertices of points for each patch, etc.), the increase from 32- to 64-bit integers effectively doubles their size.

Even with modestly large meshes, the memory usage of these table-based solutions has been a frequent point of criticism of OpenSubdiv (especially in CPU rendering cases), so the last thing we want to do is make that significantly worse in the general case. Many of these data structures simply do not scale well for large meshes, and so extensions to OpenSubdiv in recent versions have offered ways to mitigate this (e.g. sparse refinement and sparse patch tables in 3.4, the Bfr interface in 3.5). Aside from avoiding the allocation of massive amounts of memory, these solutions also facilitate distributing the work for a single large mesh over multiple threads.

In theory a mesh can have at most MAXINT vertices or faces, but that doesn't leave any room to deal with the larger sizes of tables constructed from these quantities. The operation applied has a lot to with the sizes of those tables and limits how much headroom is available between initial sizes of the mesh and the 32-bit limit. Applying uniform subdivision is going to rapidly approach that limit (sizes increasing 4x with each level). Similarly, the construction of stencil tables can have a severe effect (9-16x a large number of refined vertices or patch points). Adaptive refinement and patch tables are more incremental in general, but with extreme cases of highly irregular topology (e.g. tri-meshes with Catmull-Clark subdivision) their increase can be more multiplicative and so also prone to overflow. So the answer to the question of how large a mesh can be supported is not trivial.

In order to fully support 64-bit quantities without impacting current usage, the API for all classes using these table-based solutions would need to be extended in ways similar to what was done to extend other classes to double precision a few versions ago. Many of the classes would need to become templates for a signed integer type (32- or 64-bit) and the existing interface preserved for the 32-bit case. Users would then be faced with the choice of always using the 64-bit version to be safe (and taking the added memory usage hit on all common usage), or only using the 64-bit version when necessary.

There are no plans for extensions to 64-bit integers at this point in time. What can be done in the near term is to better document what the limitations are, what operations might fail under those limitations, and to provide run-time detection of those failures