HSF / prmon

Standalone monitor for process resource consumption

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Avg values greater than Max values in the json file

Aymane-Leyli opened this issue · comments

Hi, is it normal the the average values in the json file may be greater than the max values ?
Here is an example of a json file with this issue:

{
  "Avg": {
    "nprocs": 3.0,
    "nthreads": 8.0,
    "pss": 20407.0,
    "rchar": 8822612.5,
    "read_bytes": 105241600.0,
    "rss": 22248.0,
    "rx_bytes": 531462.5,
    "rx_packets": 2593.75,
    "swap": 0.0,
    "tx_bytes": 1452018.75,
    "tx_packets": 3300.0,
    "vmem": 467304.0,
    "wchar": 24134.375,
    "write_bytes": 0.0
  },
  "HW": {
    "cpu": {
      "CPUs": 10,
      "CoresPerSocket": 1,
      "ModelName": "Intel Core Processor (Broadwell, IBRS)",
      "Sockets": 10,
      "ThreadsPerCore": 1
    },
    "mem": {
      "MemTotal": 29987880
    }
  },
  "Max": {
    "nprocs": 3,
    "nthreads": 8,
    "pss": 20407,
    "rchar": 2823236,
    "read_bytes": 33677312,
    "rss": 22248,
    "rx_bytes": 170068,
    "rx_packets": 830,
    "stime": 0,
    "swap": 0,
    "tx_bytes": 464646,
    "tx_packets": 1056,
    "utime": 0,
    "vmem": 467304,
    "wchar": 7723,
    "write_bytes": 0,
    "wtime": 0
  }
}

Hi @Aymane-Leyli,

Thanks for reporting. I don't think that should be happening. Could you give us a bit more information on how to reproduce this problem? Are you running in a container by any chance?

Best,
Serhan

Hi @amete ,

Thank you for your response, indeed it is running in a container.

Cheers,
Aymane

Hi @Aymane-Leyli - thanks so much for reporting this.

Can you tell us what container you were running and in which host? The average values and the maximums are both measured in the container, so it should not be that avg<max in any case. The thing which is really fishy for me is that the wall time (wtime) comes out as 0. I suspect that this is the underlying issue.

We have all of our CI tests run in containers and we don't see this, so the problem doesn't seem to be general.

Cheers

Graeme

I had a private exchange w/ @Aymane-Leyli and I understand what's going on. When the sampled process takes less than a second, the "average", which is actually measurement / elapsed time in seconds, numerically becomes larger than the "maximum". I don't believe this is a bug per-se but maybe we should clarify what's published.

The reason why I initially asked if this was running in a container is that I noticed the hardware information is wrong, which is typical for running containers on a Mac (from past experience). But to my surprise, the hardware information published on lxplus via lscpu is equally wrong. That might have something to do w/ the VM setup. But overall, prmon should work equally well in a container so long as the information published under /proc is correct.

Thanks @amete , it makes more sense now.