travisdowns / uarch-bench

A benchmark for low-level CPU micro-architectural features

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cycles event sometimes gets unprogrammed with --extra-events

travisdowns opened this issue · comments

When using the perf timer and setting up a number of extra events equal (or greater) than the number of hardware perf counters, the cycles even gets unscheduled and you get a reading of zero.

E.g., setting up 8 events on my SKL box with HT disable (8 PMUs available):

Driver: intel_pstate, governor: performance
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
intel_pstate/no_turbo reports that turbo is already disabled
Using timer: perf
Welcome to uarch-bench (ee463f3)
Supported CPU features: SSE3 PCLMULQDQ VMX EST TM2 SSSE3 FMA CX16 SSE4_1 SSE4_2 MOVBE POPCNT AES AVX RDRND TSC_ADJ SGX BMI1 HLE AVX2 BMI2 ERMS RTM MPX RDSEED ADX CLFLUSHOPT INTEL_PT
Pinned to CPU 0
Programmed cycles event, caps: R:1 UT:1 ZT:1 index: 0x1
Resolved and programmed event 'uops_dispatched_port.port_0' to 'cpu/config=0x1a1/', caps: R:1 UT:1 ZT:1 index: 0x1
Resolved and programmed event 'uops_dispatched_port.port_1' to 'cpu/config=0x2a1/', caps: R:1 UT:1 ZT:1 index: 0x2
Resolved and programmed event 'uops_dispatched_port.port_2' to 'cpu/config=0x4a1/', caps: R:1 UT:1 ZT:1 index: 0x3
Resolved and programmed event 'uops_dispatched_port.port_3' to 'cpu/config=0x8a1/', caps: R:1 UT:1 ZT:1 index: 0x4
Resolved and programmed event 'uops_dispatched_port.port_4' to 'cpu/config=0x10a1/', caps: R:1 UT:1 ZT:1 index: 0x5
Resolved and programmed event 'uops_dispatched_port.port_5' to 'cpu/config=0x20a1/', caps: R:1 UT:1 ZT:1 index: 0x6
Resolved and programmed event 'uops_dispatched_port.port_6' to 'cpu/config=0x40a1/', caps: R:1 UT:1 ZT:1 index: 0x7
Resolved and programmed event 'uops_dispatched_port.port_7' to 'cpu/config=0x80a1/', caps: R:1 UT:1 ZT:1 index: 0x8
Running benchmarks groups using timer perf

** Running group cpp : Tests written in C++ **
                               Benchmark       Cycles           p0           p1           p2           p3           p4           p5           p6           p7
                 Chained multiplications         0.00         0.50         1.00         0.50         0.50         0.00         0.51         1.01         0.00
Finished in 165 ms (cpp)