paladin-t / bitty

Bitty Engine - An itty bitty 2D game engine, with built-in editors, programmable in Lua.

Home Page:https://paladin-t.github.io/bitty

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluate Profile-Guided Optimization (PGO)

zamazan4ik opened this issue · comments

Hi!

I am collecting all materials (benchmarks, articles, stories, showcases, etc.) about Profile-Guided Optimization (PGO) across different applications here. I am sure you will be able to find helpful information regarding PGO.

Related to the gamedev domains, I suggest you look at the following things:

  • I did some benchmarks for Bevy with PGO. PGO-run (first) vs non-PGO (second) - Pastebin. In these results you need to interpret performance decrease as "Release version is slower than PGOed" and performance increase as "Release version is faster than PGOed". As you see, in many scenarios PGO improves performance, but in some of them performance is decreased.
  • Unreal Engine supports PGO build since 4.27 (release notes). According to the documentation, PGO allows to achieve better performance on UE too (+10% on some CPU-heavy scenarios from this page). Also, I've talked with developers in a local Telegram chat about UE. A person said that they use PGO as a default optimization with UE and their games. The PGO profiles are collected via crafted local test workloads (usually - the most difficult scenes) with Gauntlet. The performance improvement is something like 6-8%
  • Godot proposal about PGO - link
  • Unity Burst thread about PGO - link

We need to check the PGO effects on Bitty, and if it works well - write a note in the Bitty documentation about building with PGO. I'd appreciate your providing an easy way to build Bitty with PGO (e.g. via custom build options), and experienced users will be able to do it on their own for their own usage scenarios. Another option is to optimize Bitty prebuilt binaries with a generic-enough profile. Providing PGO-optimized binaries could be a trickier task (since it requires preparing a good-enough profile) but as an option would be great to see too.

As an additional optimization, I suggest looking at LLVM BOLT. But from my experience, starting with PGO and then using BOLT would be better.