giampaolo / psutil

Cross-platform lib for process and system monitoring in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

process_iter(): no longer check whether PIDs have been reused

giampaolo opened this issue · comments

Summary

  • OS: all
  • Type: performance

Description

For every process yielded by psutil.process_iter(), internally we check whether the process PID has been reused, in which case we return a "fresh" Process instance. In order to check for PID reuse we are forced to create a new Process instance, retrieve process create_time() and compare it with the original process. Performance wise, it turns out this has a huge (and exponential) cost. This is particularly relevant because process_iter() is typically used to write task manager like apps, where the full process list is retrieved every second. I realized this at work, while writing a process monitor agent that runs on small hardware (a cleaning robot).

By removing the PID reuse check I get a a 21x speedup on a Linux OS with 481 running PIDs:

import time, psutil
started = time.monotonic()
for x in range(1000):
    list(psutil.process_iter())
print(f"completed in {(time.monotonic() - started):.4f} secs")

Current master:
Number of pids: 481. Completed in 5.1079 secs

With PID reuse check removed:
Number of pids: 481. Completed in 0.2419 secs

Repercussions

  • PID reuse is already pre-emptively checked for "write" Process APIs such as kill(), terminate(), nice() (set), etc., so in that sense it won't make any difference and we'll remain safe.
  • There are some Process APIs that are cached: exe(), create_time() and name() (Windows only). In this case, if PID has been reused, the Process instance will keep returning the old value, which doesn't happen with the current (slow) implementation, since process_iter() returns a brand new Process instance.
  • We may clear Process cache on is_running(), but we cannot clear create_time()'s cache, as the old value is necessary to detect PID reusage. This basically means a PID-reused Process instance should just be discarded by process_iter() somehow (but how?).