anrieff / libcpuid

a small C library for x86 CPU detection and feature extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intel Alder Lake processors are not properly recognized

TheTumultuousUnicornOfDarkness opened this issue · comments

Alder Lake is Intel's codename for the 12th generation of Intel Core processors based on a hybrid architecture utilizing Golden Cove high-performance cores and Gracemont power-efficient cores.

As reported on TheTumultuousUnicornOfDarkness/CPU-X#205, Intel 12th CPUs are not properly recognized by libcpuid. Example with Intel® Core™ i7-12700KF:

  num_cores  : 10
  num_logical: 20
  tot_logical: 12

According to Intel, this CPU should be:
Total Cores: 12 (8 Performance-cores + 4 Efficient-cores)
Total Threads: 20 (16 Performance-cores (HT) + 4 Efficient-cores (no HT))

Files for this CPU:
raw.txt
report.txt

Contact @Mini-Dragon for more details.

Hi, @x0rg - sorry for getting so late to this. Been involved in a lot of personal issues and didn't have much time for OSS stuff.
What do you suggest we should do? In general, libcpuid does not have the concept of topology enumeration, and this is a major task, but I think it has to be done. This will also help future ARM support, as the same kind of performance/efficiency hybrid CPUs exist there as well and are even more prevalent.

In general, do you think there's a quick fix for this specific CPU?

I would recommend to add this big-SMALL topology, instead of doing simple fix. All B0 and C0 step of 12GEN intel CPU has this arch.

Information

  • The flag of hybrid architecture can be checked from [EDX: Bit 15] the result of CPUID EAX=0x7, ECX=0.
  • The CPU Core type (Core/Big/Performance or Atom/Small/Efficient) can be checked from [EAX: Bit 31-24] the result of CPUID EAX=0x1A, ECX=0.
    • 0x10: Reserved, 0x20: Atom, 0x30, Reserved, 0x40: Core
  • Alder Lake ISA is symmetric.
  • Golden Cove has 2-threads per core, Gracemont has 1-thread per core.
  • Golden Cove has L1/L2$ per core, Gracemont has L1$ per core.
    • Gracemont 4-Core(s) share L2$.

References

In general, do you think there's a quick fix for this specific CPU?

@anrieff, sorry, I don't know. I do not have time for OSS stuff me neither.
But from above comments, I understand that topology enumeration would be a nice to have inside libcpuid.

If someone wants to work on it, feel free.

With V2 Extended Topology Enumeration Leaf (input EAX=0x1F) (SMT/Core/Module/Tile/Die) , the topology may be detected.
But I don't have an Alder Lake, and no data.

I found a sample here: https://github.com/GameTechDev/HybridDetect/tree/main/HybridDetectConsole
There are useful information in HybridDetect.h

I guess the decode_intel_extended_topology() function in libcpuid is not adapted for hybrid topologies. If I understand, we need to loop over each logical core.
Even if we just want the correct CPU core count (physical and logical), it raises more questions: how to hold this data in cpu_raw_data_t? How to expose data in cpu_id_t? How to expose topology (as cache sharing depends of type of cores)?

Be backward compatible seems a real challenge and I have the feeling this topic is too complex (at least for me, to be honest).

Sorry for responding that late (stupid Covid!).
I think in general that neither cpu_raw_data_t nor cpu_id_t can be easily adapted to hybrid topologies. We need to support arrays of these in order to describe the system fully.
For example, cpuid_get_raw_data() can have another form, like cpuid_get_all_raw_data(), which writes into an array of cpu_raw_data_t, one for each of the different CPU types. The serialization/deserialization functions can be modified slightly to accept both forms (singe vs multiple CPU ID dumps in one file)

Given that array then, we can have a new struct - system_id_t, with the following members:

  • architecture: architecture ID (x86, ARM, ...)
  • num_cpu_types: count of different processor types in the system
  • cpu_types: pointer to an array of cpu_id_t, describing each CPU type, where:
    • num_cores would list the number of physical cores of this type in the system;
    • num_logical_cpus caters for hyper-threading, if this processor has it;
    • total_logical_cpus will be the number of total threads across all types (for backwards compatibility)
    • the following new fields would be added to cpu_id_t:
      • a bitmask of the affinity ids this processor type is occupying;
      • is it a performance- or efficiency-geared core (maybe more options in the future);

Example usage, assuming we're on that Adler Lake CPU we're talking about:

struct system_id_t system;
cpu_identify_all(NULL, &system);
printf("%d\n", system.architecture); // prints 0 (CPU_ARCH_X86)
printf("%d\n", system.num_cpu_types); // prints 2 (performance and efficiency)
for (int i = 0; i < system.num_cpu_types; i++) {
    struct cpu_id_t t = system.cpu_types[i];
    printf("Type #%d:\n", i);
        printf("  affinity mask: 0x%08x\n", t.affinity_mask);
        printf("  purpose: %d\n", t.purpose); // 0 = performance, 1 = efficiency, ... (allows for future expansion)
        printf("  counts: %d %d %d\n", t.num_cores, t.num_logical_cpus, t.total_logical_cpus);
}

This would print:

0
2
Type #0:
  affinity mask: 0x0000000f
  purpose: 1
  counts: 4 4 20
Type #1:
  affinity mask: 0x000ffff0
  purpose: 0
  counts: 8 16 20

Of course, if there are differences in the cache sizes, cpu speeds, etc. they will be different in between cpu_types[0] and cpu_types[1].

As per how the old cpuid_get_raw_data() would work and produce consistent results, one way would be to add another function, e.g. cpu_request_core_type() where you'd say whether you want performance or efficiency core in your cpuid results.
If you don't specify it, it would default to PERFORMANCE for example.

What do you think?

It looks like a good proposal to me: it is backward compatible, and it should be easy to adapt program that use libcpuid.
@anrieff do you plan to implement it by yourself by chance (even some WIP in a dedicated branch)?

I found a CPU dump for Intel Core i9 12900K here. My parser can be adapted to convert an AIDA64 dump to cpu_raw_data_t.

Yes, but I plan to write it in stages. I'll likely start with the cpu_request_core_type(), as it is the easiest and would still allow us to test things.

commented

Just for reference,

Your CPU is not present in the database ==> 12th Gen Intel(R) Core(TM) i9-12900T, model: 7, ext. model: 151, ext. family: 6
Your CPU socket is not present in the database ==> 12th Gen Intel(R) Core(TM) i9-12900T, codename: NOT SUPPORTED

Hello,

I decided to start some implementation, and I push my work in the topology branch for now.
This is WIP. I will open a PR for review when it will be ready.

I have uncommitted work on cpuid_tool.c file but I am doing some tests with the Intel Core i7 12700H dump found on InstLatX64.
For now, it produces something like this (ignore the 12, it is still _SC_NPROCESSORS_ONLN returned by get_total_cpus()):

2
Type #0:
  affinity mask: 0x00000fff
  purpose: 1
  counts: 10 20 12
Type #1:
  affinity mask: 0x000ff000
  purpose: 2
  counts: 20 20 12

The decode_intel_extended_topology() function is not returning proper values yet. It seems Extended Topology Enumeration Leaf provides number of SMT for the full CPU package.

To correctly detect the number of L2 caches in Gracemont, we need to create a cache id from APIC ID and CPUID.(EAX=4, ECX=n):EAX[25:14] (Maximum number of addressable IDs for logical processors sharing this cache).
libcpuid has no information on the number of threads sharing the cache and the number of caches.

Page 14: Intel® 64 Architecture Processor Topology Enumeration - intel-64-architecture-processor-topology-enumeration.pdf
https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/cacheinfo.c#L766-L779

I am currently writing code in Rust to detect hybrid topologies, but porting it to libcpuid is difficult for me.
Sorry if this is noise.

@Umio-Yasuno On the topology branch, I changed the decode_intel_extended_topology() function a bit to extract APIC ID.
In fact, I use SMT ID and Core ID to count logical cores and physical cores for each CPU type now. Here is the output for the Intel Core i7 12700H:

Type #0:
  affinity mask: 0x00000fff
  purpose: 1
  counts: 6 12 20
Type #1:
  affinity mask: 0x000ff000
  purpose: 2
  counts: 8 8 20

I am making good progress.

Following functions are available on my branch:

  • cpuid_get_all_raw_data() (cpuset_setaffinity() function tested only with Linux for now)
  • cpuid_serialize_all_raw_data()
  • cpuid_deserialize_all_raw_data()
  • cpu_identify_all()
  • cpu_request_core_type()

I decided to put the architecture field in cpu_id_t in case a CPU vendor wants to do hybrid CPU with different architectures (like a x86+ARM CPU or platform) on day.

Here is a sample of a (native) RAW file with CPUID values for all logical CPUs on my system: ryzen-3600x-raw.txt
Please note that the cpuid_deserialize_all_raw_data() function can parse AIDA64 dumps too (it makes work easier with dumps taken from InstLatx64.

Now I need to adapt the cpuid_tool.

PR #166 submitted with support for Alder Lake CPUs.

To correctly detect the number of L2 caches in Gracemont, we need to create a cache id from APIC ID and CPUID.(EAX=4, ECX=n):EAX[25:14] (Maximum number of addressable IDs for logical processors sharing this cache). libcpuid has no information on the number of threads sharing the cache and the number of caches.

Page 14: Intel® 64 Architecture Processor Topology Enumeration - intel-64-architecture-processor-topology-enumeration.pdf https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/cacheinfo.c#L766-L779

I am currently writing code in Rust to detect hybrid topologies, but porting it to libcpuid is difficult for me. Sorry if this is noise.

That is useful, thanks. I made good progress locally, I started to update decode_intel_deterministic_cache_info() with APIC mask.
We can add this information to struct cpu_id_t. I guess a cache count is relevant (e.g. value=2 for 8 E-cores for L2, because 4 E- cores share the same L2 cache).

/** Count of Ln cache. -1 if undetermined */
int32_t ln_count;

Example for Core i9 12900HK:

CPU Info for type #0:
------------------
  arch       : x86
  purpose    : performance
  ...
  num_cores  : 6
  num_logical: 12
  tot_logical: 20
  affi_mask  : 0x00000FFF
  L1 D cache : 48 KB
  L1 I cache : 32 KB
  L2 cache   : 1280 KB
  L3 cache   : 24576 KB
  L4 cache   : -1 KB
  L1D assoc. : 12-way
  L1I assoc. : 8-way
  L2 assoc.  : 10-way
  L3 assoc.  : 12-way
  L4 assoc.  : -1-way
  L1D line sz: 64 bytes
  L1I line sz: 64 bytes
  L2 line sz : 64 bytes
  L3 line sz : 64 bytes
  L4 line sz : -1 bytes
  L1D count: 6
  L1I count: 6
  L2 count : 6
  L3 count : 1
  L4 count : -1
  ...
CPU Info for type #1:
------------------
  arch       : x86
  purpose    : efficiency
  ...
  num_cores  : 8
  num_logical: 8
  tot_logical: 20
  affi_mask  : 0x000FF000
  L1 D cache : 32 KB
  L1 I cache : 64 KB
  L2 cache   : 2048 KB
  L3 cache   : 24576 KB
  L4 cache   : -1 KB
  L1D assoc. : 8-way
  L1I assoc. : 8-way
  L2 assoc.  : 16-way
  L3 assoc.  : 12-way
  L4 assoc.  : -1-way
  L1D line sz: 64 bytes
  L1I line sz: 64 bytes
  L2 line sz : 64 bytes
  L3 line sz : 64 bytes
  L4 line sz : -1 bytes
  L1D count: 8
  L1I count: 8
  L2 count : 2
  L3 count : 1
  L4 count : -1
  ...

Problem is L3 is shared between E and P cores.

By comparing CPUID.(EAX=0x1):EBX[23-16] (Maximum number of addressable IDs for logical processors in this physical package) and CPUID.(EAX=4, ECX=n):EAX[25:14] + 1 (Maximum number of addressable IDs for logical processors sharing this cache),
we can check if that cache is shared by all threads.
In i7-12700H, CPUID.(EAX=0x1):EBX[23-16] and CPUID.(EAX=4, ECX=3):EAX[25:14] + 1 (L3 Cache) are set to 128.

By comparing CPUID.(EAX=0x1):EBX[23-16] (Maximum number of addressable IDs for logical processors in this physical package) and CPUID.(EAX=4, ECX=n):EAX[25:14] + 1 (Maximum number of addressable IDs for logical processors sharing this cache), we can check if that cache is shared by all threads. In i7-12700H, CPUID.(EAX=0x1):EBX[23-16] and CPUID.(EAX=4, ECX=3):EAX[25:14] + 1 (L3 Cache) are set to 128.

Ok, it is WIP in #168.

#166 (comment)

It looks like the Core_ID and SMT_ID are incorrect.
SMT_ID is either 0 or 1 if the number of threads per core is 2.

Pkg_ID/Core_ID/SMT_ID can be checked from cpuid | grep "APIC synth".

ref: Section 1.5 - Intel® 64 Architecture Processor Topology Enumeration - intel-64-architecture-processor-topology-enumeration.pdf
ref: https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c

#166 (comment)

It looks like the Core_ID and SMT_ID are incorrect. SMT_ID is either 0 or 1 if the number of threads per core is 2.

Pkg_ID/Core_ID/SMT_ID can be checked from cpuid | grep "APIC synth".

ref: Section 1.5 - Intel® 64 Architecture Processor Topology Enumeration - intel-64-architecture-processor-topology-enumeration.pdf ref: https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/topology.c

Ok, fixed in 426d687, that looks better now:

Identifying logical core 0
APIC ID 0x00000000, package ID 0x00000000, core ID 0x00000000, SMT ID 0x00000000
Identifying logical core 1
APIC ID 0x00000002, package ID 0x00000000, core ID 0x00000002, SMT ID 0x00000000
Identifying logical core 2
APIC ID 0x00000004, package ID 0x00000000, core ID 0x00000004, SMT ID 0x00000000
Identifying logical core 3
APIC ID 0x00000008, package ID 0x00000000, core ID 0x00000008, SMT ID 0x00000000
Identifying logical core 4
APIC ID 0x0000000A, package ID 0x00000000, core ID 0x0000000A, SMT ID 0x00000000
Identifying logical core 5
APIC ID 0x0000000C, package ID 0x00000000, core ID 0x0000000C, SMT ID 0x00000000
Identifying logical core 6
APIC ID 0x00000001, package ID 0x00000000, core ID 0x00000000, SMT ID 0x00000001
Identifying logical core 7
APIC ID 0x00000003, package ID 0x00000000, core ID 0x00000002, SMT ID 0x00000001
Identifying logical core 8
APIC ID 0x00000005, package ID 0x00000000, core ID 0x00000004, SMT ID 0x00000001
Identifying logical core 9
APIC ID 0x00000009, package ID 0x00000000, core ID 0x00000008, SMT ID 0x00000001
Identifying logical core 10
APIC ID 0x0000000B, package ID 0x00000000, core ID 0x0000000A, SMT ID 0x00000001
Identifying logical core 11
APIC ID 0x0000000D, package ID 0x00000000, core ID 0x0000000C, SMT ID 0x00000001

To correctly detect the number of L2 caches in Gracemont, we need to create a cache id from APIC ID and CPUID.(EAX=4, ECX=n):EAX[25:14] (Maximum number of addressable IDs for logical processors sharing this cache). libcpuid has no information on the number of threads sharing the cache and the number of caches.

Hello @Umio-Yasuno. FYI, it is almost ready in #168. Feel free to comment in the PR.