Protect monotonic increasing stats against race conditions
graeme-a-stewart opened this issue · comments
Graeme A Stewart commented
@rmaganza reports an unusual case where the utime
measured by prmon
suffered an anomolous drop, right at the end of the job (PandaID 4929654149).
There is a possible race condition that might cause this:
- prmon polls the list of PIDs owned by the mother process
- prmon starts to loop over the process list and collects stats from the mother
- one or more children exit before they can be polled
- although the kernel now attributes child resource consumption back to the mother we don’t see it, because the mother was already polled
- polling the stats for the exited processes (obviously) fails
We should protect monotonic increasing stats against dropping,