andikleen / pmu-tools

Intel PMU profiling tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TMA: Pause-loop is not classified at all levels

aayasin opened this issue · comments

A rather simple pause-loop kernel is classified properly as Core Bound at levels 1 & 2 and so in levels 5 and 6, but not at the mid-levels 4 and 5.
I am documenting this here and will address it in TMA 4.2 release.
Here is a reproducer with perf-tools.

P.S. @andikleen: This is a CFL machine (8th gen Core). Can that be reflected instead of [skl] in 1st line of toplev output?

$ ./kernels/gen-kernel.py -i pause > ./kernels/pause3x.c
$ gcc -g -O2 -o ./kernels/pause3x ./kernels/pause3x.c

$ ./pmu-tools/toplev.py --no-desc --no-perf --nodes '+CoreIPC,+UPI,+Time,+MUX' -vl6 -- ./kernels/pause3x 10000000  2>&1 | egrep -v ' [10]\.. '
# 4.11-full-perf on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [skl]
BE             Backend_Bound                                                                                 % Slots                      98.9
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                      98.8   <==
FE             Frontend_Bound.Fetch_Latency.MS_Switches                                                      % Clocks                      2.6 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                      9.9 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                      8.5 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation             % Clocks                     98.9 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation.Slow_Pause  % Clocks                     91.3 <
Info.Thread    UPI                                                                                             Metric                      2.9
RET            Retiring.Light_Operations.Other                                                               % Uops                      100.0 <
MUX                                                                                                          %                             2.2
Using level 6.

$ lscpu
Architecture:         x86_64
CPU op-mode(s):       32-bit, 64-bit
Byte Order:           Little Endian
CPU(s):               12
On-line CPU(s) list:  0-5
Off-line CPU(s) list: 6-11
Thread(s) per core:   1
Core(s) per socket:   6
Socket(s):            1
NUMA node(s):         1
Vendor ID:            GenuineIntel
CPU family:           6
Model:                158
Model name:           Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Stepping:             10
CPU MHz:              3701.500
CPU max MHz:          3700.0000
CPU min MHz:          800.0000
BogoMIPS:             7399.70
Virtualization:       VT-x
L1d cache:            32K
L1i cache:            32K
L2 cache:             256K
L3 cache:             12288K

I fixed the reporting for CFL/KBL (but nothing else atm)

With TMA 4.2 release, I just verified this issue is indeed resolved on a CLX system.
This entry can be closed.

[labuser@ssp-wpclx-cdi276 perf-tools]$ python2.7 ./do.py build profile -g "-i PAUSE -n 3" -a pause3x -ki 1e8 --profile-mask 0x40 -v1 --pmu-tools '/bin/python2.7 ./pmu-tools/'                                 
building kernel: pause3x ..                                                                                                                                                                                    
./kernels/gen-kernel.py -i PAUSE -n 3 > ./kernels/pause3x.c 2>&1                                                                                                                                               
gcc -g -O2 -o ./kernels/pause3x ./kernels/pause3x.c 2>&1                                                                                                                                                       
topdown auto-drilldown ..                                                                                                                                                                                      
/bin/python2.7 ./pmu-tools//toplev.py --no-desc  --drilldown --nodes '+CoreIPC,+Instructions,+CORE_CLKS,+CPU_Utilization,+Time,+MUX,+IpTB,+L2MPKI' -V pause3x-1e8.toplev--drilldown-perf.csv --metric-group +Summary,+HPC -- taskset 0x4 ./kernels/pause3x 100000000  2>&1 | tee pause3x-1e8.toplev--drilldown.log | egrep -v "^(Run toplev|Adding|Using)"                                                                    
2 events not supported                                                                                                                                                                                         
# 4.19-full-perf on Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz [clx/skylake]                                                                                                                                
BE             Backend_Bound    % Slots                   95.9  <==                                                                                                                                            
Info.Core      CoreIPC            CoreMetric               0.1                                                                                                                                                 
Info.Inst_Mix  Instructions       Count          621,945,279.0                                                                                                                                                 
Info.Thread    IPC                Metric                   0.1                                                                                                                                                 
Info.System    CPU_Utilization    Metric                   1.0                                                                                                                                                 
Info.System    Time               Seconds                  3.3                                                                                                                                                 
Info.Thread    IpTB               Metric                   6.1                                                                                                                                                 
Info.Core      CORE_CLKS          Count       12,281,209,061.0                                                                                                                                                 
Info.Memory    L2MPKI             Metric                   0.1                                                                                                                                                 
MUX                             %                          9.2                                                                                                                                                 
Rerunning workload                                                                                                                                                                                             
BE             Backend_Bound             % Slots                   95.9                                                                                                                                        
Info.Core      CoreIPC                     CoreMetric               0.0                                                                                                                                        
Info.Inst_Mix  Instructions                Count          609,719,766.0                                                                                                                                        
BE/Core        Backend_Bound.Core_Bound  % Slots                   95.9  <==                                                                                                                                   
Info.Thread    IpTB                        Metric                   6.0                                                                                                                                        
Info.Core      CORE_CLKS                   Count       12,241,658,267.0                                                                                                                                        
Info.Memory    L2MPKI                      Metric                   0.0                                                                                                                                        
Info.System    CPU_Utilization             Metric                   1.0                                                                                                                                        
Info.System    Time                        Seconds                  3.3                                                                                                                                        
MUX                                      %                         18.3                                                                                                                                        
Rerunning workload                                                                                                                                                                                             
BE             Backend_Bound                               % Slots                   95.9                                                                                                                      
Info.Core      CoreIPC                                       CoreMetric               0.1                                                                                                                      
Info.Inst_Mix  Instructions                                  Count          611,825,162.0                                                                                                                      
BE/Core        Backend_Bound.Core_Bound                    % Slots                   95.9                                                                                                                      
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization  % Clocks                  95.2  <==
Info.Thread    IpTB                                          Metric                   6.1
Info.Core      CORE_CLKS                                     Count       12,197,285,099.0
Info.Memory    L2MPKI                                        Metric                   0.1
Info.System    CPU_Utilization                               Metric                   1.0
Info.System    Time                                          Seconds                  3.3
MUX                                                        %                         12.2
Rerunning workload
BE             Backend_Bound                                                % Slots                   95.9
Info.Core      CoreIPC                                                        CoreMetric               0.1
Info.Inst_Mix  Instructions                                                   Count          615,664,622.0
BE/Core        Backend_Bound.Core_Bound                                     % Slots                   95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                   % Clocks                  95.1
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0  % Clocks                  91.0  <==
Info.Thread    IpTB                                                           Metric                   6.1
Info.Core      CORE_CLKS                                                      Count       12,247,617,237.0
Info.Memory    L2MPKI                                                         Metric                   0.1
Info.System    CPU_Utilization                                                Metric                   1.0
Info.System    Time                                                           Seconds                  3.3
MUX                                                                         %                         12.2
Rerunning workload
BE             Backend_Bound                                                                      % Slots                   95.9
Info.Core      CoreIPC                                                                              CoreMetric               0.1
Info.Inst_Mix  Instructions                                                                         Count          613,116,500.0
BE/Core        Backend_Bound.Core_Bound                                                           % Slots                   95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                         % Clocks                  95.5
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                        % Clocks                  91.0
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation  % Clocks                  95.9  <==
Info.Thread    IpTB                                                                                 Metric                   6.1
Info.Core      CORE_CLKS                                                                            Count       12,216,356,878.0
Info.Memory    L2MPKI                                                                               Metric                   0.1
Info.System    CPU_Utilization                                                                      Metric                   1.0
Info.System    Time                                                                                 Seconds                  3.3
MUX                                                                                               %                         12.2
Rerunning workload
BE             Backend_Bound                                                                                 % Slots                   95.9
Info.Core      CoreIPC                                                                                         CoreMetric               0.1
Info.Inst_Mix  Instructions                                                                                    Count          616,309,280.0
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                   95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                  94.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                  91.0
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation             % Clocks                  95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation.Slow_Pause  % Clocks                  97.7  <==
Info.Thread    IpTB                                                                                            Metric                   6.1
Info.Core      CORE_CLKS                                                                                       Count       12,256,754,613.0
Info.Memory    L2MPKI                                                                                          Metric                   0.1
Info.System    CPU_Utilization                                                                                 Metric                   1.0
Info.System    Time                                                                                            Seconds                  3.3
MUX

Fixed