HSF / prmon

Standalone monitor for process resource consumption

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Protect monotonic increasing stats against race conditions

graeme-a-stewart opened this issue · comments

@rmaganza reports an unusual case where the utime measured by prmon suffered an anomolous drop, right at the end of the job (PandaID 4929654149).

There is a possible race condition that might cause this:

  1. prmon polls the list of PIDs owned by the mother process
  2. prmon starts to loop over the process list and collects stats from the mother
  3. one or more children exit before they can be polled
  4. although the kernel now attributes child resource consumption back to the mother we don’t see it, because the mother was already polled
  5. polling the stats for the exited processes (obviously) fails

We should protect monotonic increasing stats against dropping,