littlekernel / lk

LK embedded kernel

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluate Profile-Guided Optimization (PGO)

zamazan4ik opened this issue · comments

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available here. Here you can find different applications from different domains that were accelerated with PGO: operating systems (like Linux and Windows kernels), virtual machines (like QEMU and CrosVM), compilers, gRPC workloads, benchmark tools, databases, and much more. So that's why I think it's worth trying to apply PGO to LK.

I can suggest the following things to do:

  • Evaluate PGO's applicability to LK via benchmarks.
  • If PGO helps to achieve better performance - add a note to LK's documentation about that. In this case, users and maintainers will be aware of another optimization opportunity for LK.
  • Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.

After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO.

So the question here is how does the system get the data out of the run and back into the build system? In the case of an operating system kernel, the trouble is the data ends up on the device and it's hard to extract it. Is there any prior art to running this over a kernel?

So the question here is how does the system get the data out of the run and back into the build system?

In general case, PGO profiles are saved to a disk. Then, on the optimization phase, these profiles are passed to a compiler via compiler options like -fprofile-use=pgo_profile_name (more details can be found in the documentation for your compiler).

In the case of an operating system kernel, the trouble is the data ends up on the device and it's hard to extract it. Is there any prior art to running this over a kernel?

Yes, there are multiple examples:

E.g. in the Linux kernel case, profiles can be gathered via procfs (AFAIK) and then passed to the compiler on the optimization phase.