mstorsjo / llvm-mingw

An LLVM/Clang/LLD based mingw-w64 toolchain

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enable Profile-Guided Optimization (PGO) for packages

zamazan4ik opened this issue · comments

Hi! Thanks a lot for the scripts.

Can I suggest enabling (probably optionally via opt-in/opt-out flags) building LLVM with Profile-Guided Optimization? According to the tests, PGO helps a lot with improving performance for LLVM-based projects: Clang (up to +20% compilation speed), LLD, clang-tidy, and others. E.g. for LLVM-based projects, you can check the benchmarks here. Probably for CPython PGO build can be enabled as well - CPython supports PGO in the upstream. If you want to check more PGO benchmarks, you can find them here.

Regarding opt-in or opt-out - this decision is up to you. Since enabling PGO automatically enables a double compilation process, and it can be a significant issue for someone.

For Clang PGO build is even supported in the upstream CMake scripts: https://github.com/llvm/llvm-project/blob/main/clang/cmake/caches/PGO.cmake

Yes, I've considered looking into PGO.

The main obstacles are, as you already mentioned, extra build time (which could be an issue, or could be acceptable), but secondly, the fact that this toolchain is primarily cross compiled. I.e. the final compiler binaries that end up running on Windows are compiled on Linux.

The normal use of PGO would use first build one compiler, then use this with some test data, and then rebuild the compiler with this profile data. That's a bit more complicated when cross compiling. AFAIK it might be possible to reuse profiling data from one platform to another, so it might be possible to do profiling on Linux and then reuse this profile data when cross compiling for Windows; not every symbol might match, but as long as the majority does I guess the results should be reasonable. When building the i686 compiler, one would need to adjust the symbols in the profile though, to add/remove a leading underscores for symbols to match between i686 Windows and other platforms.

I.e., in short, yes I've considered looking into it, but haven't gotten to it yet.

Just for information, msys2 project now has PGO enabled clang 17.0.6 (thanks to MehdiChinoune). This commit may help msys2/MINGW-packages@4dd91d1

AFAIK it might be possible to reuse profiling data from one platform to another, so it might be possible to do profiling on Linux and then reuse this profile data when cross compiling for Windows; not every symbol might match, but as long as the majority does I guess the results should be reasonable.

It could be quite tricky. Not sure about PGO profile interoperability between Windows and Linux. However, according to my tests, PGO profiles between Linux and macOS were not compatible (llvm-profdata overlap metric was 0%). The same results were met by the Rust dev team when they tried to reuse profiles between platforms.

A possible thing to avoid it can be the following process:

  • Compile an instrumented binary for the Windows platform on the Linux build machine for
  • Run the instrumented binary on Windows with some sample workload
  • Reuse this profile later during the optimized binary compilation for the Windows platform on the Linux build machine

The biggest problem here is running on Windows. Maybe it can be somehow automated via a dedicated Win machine or even a VM/Wine? Maybe we can somehow "manually" prepare this profile between each LLVM package upgrade. Or maybe leave it just to the user - a user will need to collect their PGO profile and pass it to the build-llvm.sh script. However, I understand that the process is tricky.

It could be quite tricky. Not sure about PGO profile interoperability between Windows and Linux. However, according to my tests, PGO profiles between Linux and macOS were not compatible (llvm-profdata overlap metric was 0%). The same results were met by the Rust dev team when they tried to reuse profiles between platforms.

I would expect that this is because macOS also uses a symbol prefix, an underscore (just like Windows/i686), on all architectures. So to reuse profile data between macOS and Linux, one would need to add or remove such an underscore.

So such a feature for adapting a profile for reuse across platforms, would be more widely useful than only for this particular Linux->Windows cross compilation. It still probably wouldn't get a 100% match due to various cases where things can differ, but if it gets significant enough coverage, it'd still probably be good enough.

The biggest problem here is running on Windows. Maybe it can be somehow automated via a dedicated Win machine or even a VM/Wine? Maybe we can somehow "manually" prepare this profile between each LLVM package upgrade. Or maybe leave it just to the user - a user will need to collect their PGO profile and pass it to the build-llvm.sh script. However, I understand that the process is tricky.

Yeah, I've also considered something like that. A prebuilt profile, which can be updated on request with clear reproducible instructions, probably would be quite useful. A 100% match might not be realistic in all cases, but if it's good enough, it could be useful.

I would expect that this is because macOS also uses a symbol prefix, an underscore (just like Windows/i686), on all architectures. So to reuse profile data between macOS and Linux, one would need to add or remove such an underscore.

Honestly, that's a great idea to investigate the difference between profiles for each platform for the same code. If the difference is only in extra underscores or something like that, it would be interesting to implement some kind of converter for profiles. In this case, it would be easier to enable using Linux profiles for macOS. I already mentioned problems with gathering PGO profiles for the macOS platform due to limited support for macOS in some CI platforms (and macOS build agents can be quite expensive, you know).

Yep, exactly.

The cases where the profile won't match, is of course for trivial bits with platform specific code (like llvm/lib/Support/{Windows,Unix}). But another potential case can be for types that are typedeffed differently, e.g. int64_t, is it long or long long? AFAIK for a C++ function that takes int64_t, for the mangled symbol name, it gets flattened to the original raw type (long vs long long), which would be differing. Also anything with size_t wouldn't match if crossing from one bitness to another.

But as long as such cases is a minority of the interesting bits in the profile, such cross profiling could indeed potentially be very valuable!