saturnsky / saturn_affinity_python

Programs to optimize the game's performance on CPUs with non-uniform configurations, such as multiple CCX configurations(Highend Ryzen series) or combinations of P+E cores(Highend Intel modern CPUs).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for early Ryzen CPUs

saturnsky opened this issue · comments

For some games that are highly cache-sensitive with weak multithreading support, even older Zen CPUs can see performance gains from exclusively utilizing L3 cache.
For consumer Zen (1st Gen) and Zen 1+ CPUs, there are 2 CCXs in a 1CCD, with 8MB of L3 cache per CCX. Therefore, I would expect to see performance gains by focusing games on CCX0 and non-gaming programs on CCX1.
image
I did not intend to support this situation when I developed the software, but I suspect that current versions of the program would also see performance gains in this scenario. I am looking for environments to benchmark this scenario and will share the results.

The biggest exception is the consumer Zen 2. Zen 2 has 2CCX for every 1CCD, with a configuration of 2CCD. Each CCX holds 16MB of L3 cache.
image

Therefore, on Zen 2, it is likely that performance improvements can be expected by concentrating games on CCX0 and spreading non-gaming programs across CCX1 through CCX3. However, this means that there will be four cores allocated to game, which could potentially decrease performance for games with strong multi-threading support.
The current version of the software does not support this scenario. Further development will be done to support this scenario, but I do not have a test environment for this scenario and cannot benchmark the performance improvement.

I tested the program on a Ryzen 1700X and didn't notice any meaningful performance gains on Stellaris. I assume this is because the 8MB of L3 cache is too small to reduce the cache miss rate on Stellaris. This result may vary depending on the game you are playing and your system configuration.
I wonder what the results would be on a Ryzen 3000 series with 16MB of cache per cache cluster.