cburstedde / p4est

The "p4est" forest-of-octrees library

Home Page:www.p4est.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance regression between p4est <=2.3.3 and 2.8.5

tamiko opened this issue · comments

As part of our ongoing deal.II performance instrumentation (found here) I have noticed a massive slowdown of our (parallel) grid creation measurements between the bullseye-gcc-10 (Debian bullseye with p4est 2.2 and gcc-10) and the gentoo-clang-16 (Gentoo linux container with p4est 2.8.5) variants. A manual investigation shows that the differences are caused by different p4est versions.

I have run one of the performance tests (timing-step_37 running with 8 mpi ranks) manually on my laptop and observe a massive change in timings when switching from p4est 2.3.3 (+ libsc-2.3.3) and p4est 2.8.5 (+ libsc-2.8.5):

name                  min        max        mean       std_dev     samples
setup_grid            12.7531    13.6763    13.3272    0.427717    4

versus

name                  min        max        mean       std_dev     samples
setup_grid            1.39987    1.41624    1.40541    0.00745234  4

A 10 fold performance reduction for mesh creation with p4est-2.8.5 is very unexpected.

Is there something I should be aware of?

In reference to dealii/dealii#15742

This is a very good point.

The Gentoo ebuild "eclass" for CMake overrides CMAKE_BUILD_TYPE to RelWithDebInfo (it used to set a fully custom Gentoo build type but that got changed recently due to too many compatibility issues). The final cmake configuration run prints the following injected status message:

-- <<< Gentoo configuration >>>
Build type      RelWithDebInfo
Install path    /usr
Compiler flags:
C               -march=native -O2 -pipe -ggdb
C++             
Linker flags:
Executable      -Wl,-O1 -Wl,--as-needed
Module          -Wl,-O1 -Wl,--as-needed
Shared          -Wl,-O1 -Wl,--as-needed

When I converted the Gentoo ebuild for libsc/p4est to CMake I checked manually that the -mach=native -O2 -ggdb compile flag and the linker flags are getting picked up, and that no -DDEBUG shows up in the compiler parameter lists. So everything looked reasonable.

But what I didn't had on my radar is that you do the following in cmake/config.cmake:

  117 if(CMAKE_BUILD_TYPE MATCHES "(Debug|RelWithDebInfo)")                            
  118   set(P4EST_ENABLE_DEBUG 1)                                                      
  119 endif()   

This is quite a bit surprising... May I ask why you are doing this?

The RelWithDebInfo build type normally indicates a fully optimized release build - just with additional debug symbols (that, for example, for Debian and Gentoo get stripped from the final library and stored separately under /usr/lib/debug - so they don't even create any runtime overhead).

Anyway, recompiling and making sure that P4EST_ENABLE_DEBUG is not set I am down to a 10% performance regression:

name                  min        max       mean      std_dev     samples
setup_grid            1.47084    1.67232   1.54484   0.0881628   4

So, phew

Now that we are down to smaller margins I will actually have to rerun these performance tests much more carefully. Let me do this over the next days and come back to you with some more careful numbers.

I am happy to do some more detailed profiling if this 10% slowdown is indeed real and you want to investigate further.

Ok that's a relief. Note that --enable-debug activates lots of consistency checks requiring much runtime.

It is orthogonal to compiler configurations like -g, -O and -DDEBUG.

I'll be interested in those 10%. We'd rather not have any measurable regression. This may also change again with the upcoming 2.8.6 branch due to minor simplifications.

@tamiko

117 if(CMAKE_BUILD_TYPE MATCHES "(Debug|RelWithDebInfo)")
118 set(P4EST_ENABLE_DEBUG 1)
119 endif()

Yes this is surprising, I guess P4EST_ENABLE_DEBUG should be defined only for a Debug build not for RelWithDebInfo.
I don't know what motivated this (see commit f3f9891). But I think it should be modified.

Of coarse running with P4EST_ENABLE_DEBUG is still a good thing, but my point was:

in the build system, symbol P4EST_ENABLE_DEBUG is only defined when:

  • in autotools, configure is run with --enable-debug

So in the cmake build, this symbol should only be defined in "Debug" build mode, not in "RelWithDebInfo" which is a release build mode (optimisation flags activated), with also debug symbol ( roughly equilvalent to "-O2 -g" flags).

So when building with cmake using "RelWithDebInfo" build mode, it is surprising to have P4EST_ENABLE_DEBUG=1.
I think, in cmake, P4EST_ENABLE_DEBUG should only be 1, when building the debug version.

It will be great to deactivate _ENABLE_DEBUG for RelWithDebInfo (for both p4est and sc, actually).

This issue is now addressed on the develop branch of p4est. It was specific to CMake.

Please note that the recommended way to build p4est is and will always be autotools which is known to work well.
Nonetheless, we are grateful on any improvement of the CMake build.

How does everything work for y'all?