process_iter(): no longer check whether PIDs have been reused
giampaolo opened this issue · comments
Summary
- OS: all
- Type: performance
Description
For every process yielded by psutil.process_iter()
, internally we check whether the process PID has been reused, in which case we return a "fresh" Process
instance. In order to check for PID reuse we are forced to create a new Process instance, retrieve process create_time()
and compare it with the original process. Performance wise, it turns out this has a huge (and exponential) cost. This is particularly relevant because process_iter()
is typically used to write task manager like apps, where the full process list is retrieved every second. I realized this at work, while writing a process monitor agent that runs on small hardware (a cleaning robot).
By removing the PID reuse check I get a a 21x speedup on a Linux OS with 481 running PIDs:
import time, psutil
started = time.monotonic()
for x in range(1000):
list(psutil.process_iter())
print(f"completed in {(time.monotonic() - started):.4f} secs")
Current master:
Number of pids: 481. Completed in 5.1079 secs
With PID reuse check removed:
Number of pids: 481. Completed in 0.2419 secs
Repercussions
- PID reuse is already pre-emptively checked for "write" Process APIs such as
kill()
,terminate()
,nice()
(set), etc., so in that sense it won't make any difference and we'll remain safe. - There are some
Process
APIs that are cached:exe()
,create_time()
andname()
(Windows only). In this case, if PID has been reused, theProcess
instance will keep returning the old value, which doesn't happen with the current (slow) implementation, sinceprocess_iter()
returns a brand newProcess
instance. - We may clear
Process
cache onis_running()
, but we cannot clearcreate_time()
's cache, as the old value is necessary to detect PID reusage. This basically means a PID-reusedProcess
instance should just be discarded byprocess_iter()
somehow (but how?).