HSF / prmon

Standalone monitor for process resource consumption

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

prmon broken on ARM

graeme-a-stewart opened this issue Β· comments

I tried compiling prmon on ARM (Raspberry Pi 4 with Raspberry Pi OS). Perhaps unsurprisingly there are a few issues to fix:

  • Casting the return value of getopt_long() to char we implicitly assume signed char (comparing to -1), but actually plain char is system dependent and on ARM it's unsigned, meaning that char(-1) is 255
  • The CPU hardware counters come out with 0 core count
    • Specifically we need to fix this for ARM, plus the code needs to be protected against division by zero
  • The compiler on ARM spits out thousands of ABI warnings (which are irrelevant), so on this platform we need to add the -Wno-psabi

I guess the second can be easily avoided with:

unsigned int nSockets = ( nSiblings != 0 ? nCPU / nSiblings : 0 );
unsigned int nThreads = ( nCores != 0 ? nSiblings / nCores : 0 );

here. I guess I was not expecting the number of cores to be zero πŸ˜ƒ

Just out of curiosity, what does the /proc/cpuinfo file look like? If it doesn't exist then we shouldn't have made that far.

It does exist, but it's rather different from what we get on x86:

processor	: 0
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 1
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 2
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 3
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

Hardware	: BCM2835
Revision	: d03114
Serial		: 1000000033442f1a
Model		: Raspberry Pi 4 Model B Rev 1.4

cpuinfo.txt

That's an interesting looking /proc/cpuinfo, never seen such a format before but I'm very biased πŸ˜„

I guess one thing we can do is to check for "ARM" in the model name and try not to fill nSockets, nCoresPerSocket , and nThreadsPerCore if it exists. There doesn't seem to be any information that would/could replace them. For ARM, we'll have only the model name along w/ the nCPU.

Do you want me to take a stab or are you already on it? I'm happy either way πŸ˜„

I guess the second can be easily avoided with:

unsigned int nSockets = ( nSiblings != 0 ? nCPU / nSiblings : 0 );
unsigned int nThreads = ( nCores != 0 ? nSiblings / nCores : 0 );

here. I guess I was not expecting the number of cores to be zero πŸ˜ƒ

Well, it's 1 socket, 1 processor, 4 cores in this case, so we would want as output:

    "cpu": {
      "model name": "ARMv7 Processor rev 3 (v7l)",
      "nCPU": 4,
      "nCoresPerSocket": 4,
      "nSockets": 1,
      "nThreadsPerCore": 1
    },

Right?

I have an old original Raspberry Pi too - I should take a look at that one (should be 1 for everything).

I wonder what a Power processor or a powerful ARM (like a ThunderX) looks like...?

Raspberry Pi v1:

pi@raspbmc:~$ cat /proc/cpuinfo 
processor	: 0
model name	: ARMv6-compatible processor rev 7 (v6l)
Features	: swp half thumb fastmult vfp edsp java tls 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xb76
CPU revision	: 7

Hardware	: BCM2708
Revision	: 000e
Serial		: 0000000076c470b3

That should then be:

    "cpu": {
      "model name": "ARMv6-compatible processor rev 7 (v6l)",
      "nCPU": 1,
      "nCoresPerSocket": 1,
      "nSockets": 1,
      "nThreadsPerCore": 1
    },

Do you want me to take a stab or are you already on it? I'm happy either way πŸ˜„

Maybe better that I have a go (am I right about the target output?) as I can validate quickly.

Then you check the PR that I didn't mess it up for x86

I guess the second can be easily avoided with:
unsigned int nSockets = ( nSiblings != 0 ? nCPU / nSiblings : 0 );
unsigned int nThreads = ( nCores != 0 ? nSiblings / nCores : 0 );
here. I guess I was not expecting the number of cores to be zero πŸ˜ƒ

Well, it's 1 socket, 1 processor, 4 cores in this case, so we would want as output:

    "cpu": {
      "model name": "ARMv7 Processor rev 3 (v7l)",
      "nCPU": 4,
      "nCoresPerSocket": 4,
      "nSockets": 1,
      "nThreadsPerCore": 1
    },

Right?

Right but taking the information at face value, there is no way to know for sure there is a single socket and a single thread per core.

On a typical x86 platform, we have physical id, siblings, and cpu cores. Every unique physical id is associated w/ a socket, then the relation between siblings and cpu cores essentially gives us the thread count per core.

Here I only see the processor count. I'm not very familiar w/ this platform but do we always have a single socket and a single thread per core? Maybe I'm missing something?

Here I only see the processor count. I'm not very familiar w/ this platform but do we always have a single socket and a single thread per core?

For Raspberry Pi boards, yes.

This is, of course, a bit of an academic, but getting it right on ARM servers that might appear in data centres (like ThunderX or A64FX) could be actually useful.

Let me toss a wildcard in here after reading a few things... shouldn't we get this kind of CPU information from /sys/devices/system/cpu/cpuNN/topology/* or use lscpu...?

Originally I thought about using lscpu but given that it reads the information from /proc/cpuinfo to begin with, I didn't see much value. I thought simply reading the file is easier than invoking/parsing lscpu.

For /sys/devices/system/cpu/cpuNN/topology/*, we need to extract the total number of processors first, no? The easiest way to count the processors is again looping over /proc/cpuinfo (or doing a glob-type operation in C++ which is a bit awkward). If we anyways open /proc/cpuinfo, we might just as well get everything from there. Also, I find opening a single file better than N files.

I actually played around w/ some of these files while looking into #102. There is quite a bit of information one can access from /sys/devices/system/cpu/cpuNN/ if we wish to add.

Having said all these, if you think switching to any other method will make things more robust I'm completely open to the idea.

Now that I think about this, is lscpu more robust against different hardware (which I originally didn't give much thought) and guaranteed to always exist? If so, perhaps we can switch to using it. This should be easier now since we have prmon::cmd_pipe_output, which'll streamline quite a few things. Also, we won't have to code up any home-cooked logic πŸ˜„

Well, I think it must be fairly ubiquitous... It was on my really old RP1.

Here's the output from v1:

pi@raspbmc:~$ lscpu 
Architecture:          armv6l
Byte Order:            Little Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1

And v4:

pi@pisvr:~/prmon $ lscpu 
Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               3
Model name:          Cortex-A72
Stepping:            r0p3
CPU max MHz:         1500.0000
CPU min MHz:         600.0000
BogoMIPS:            108.00
Flags:               half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32

I think the advantage is that it's going to be battle tested against a wider set of CPUs and kernels than we could ever hope to manage. And It manages the logic of CPUs, sockets, cores and threads for us.

The model name string is missing from the RPv1, but that's surely a corner case...

Let's go for lscpu, then. That should be a fairly straightforward change given that you already coded up prmon::cmd_pipe_output which is what I was trying to avoid originally (just for getting the CPU information) πŸ˜ƒ