htop-dev / htop

htop - an interactive process viewer

Home Page:https://htop.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(Linux) Utilize `pidfd` when killing processes

Explorer09 opened this issue · comments

This is a feature proposal inspired from the support question #1441.

The procedure of killing a process in htop has these steps:

  1. Highlight a process with the (terminal) cursor or tag all processes that the user wishes to kill.
  2. Press the 'k' key (or Kill command on the function bar).
  3. Select a signal to send in the signal list popup and press 'Enter'.

Between step 2 and step 3, a selected process may die and another process get assigned the same PID as the old process by chance. And that would result in a wrong process being killed.

This race condition is partly unavoidable in a Unix-like system by design, but Linux has a PID file descriptor that allow a process to kill another process (pidfd_send_signal(2)) by file descriptor rather than PID. This could be a safer way for htop to kill a process.

Note that file descriptor is a more scarce resource than PID, so htop shouldn't hold these PID file descriptors longer than necessary.

This proposal is to utilize PID file descriptor in htop:

  • After user presses 'k' key (step 2) and the signal list opened, htop opens pidfd of all tagged processes to kill. (Or one process if the user doesn't tag any.)
  • Once a signal is selected and confirmed (step 3), send signal using pidfd_send_signal() and close the file descriptors.
  • If the user doesn't select a signal in the signal list, then after a set timeout (e.g. 1 minute or 5 minutes), close the signal list automatically and close the file descriptors. This timeout mechanism is to prevent htop from holding resources for too long.

This feature would be optional. If we can detect pidfd being available in configure time, then we can use pidfd for killing processes, otherwise, fall back to traditional kill(2).

Please see comment #1441 (comment).

I have several issues with this proposal, which are (in no particular order):

  1. The scope of the issue and its impact is too minimal and does not require being addressed
  2. You can potentially tag 1000s of processes, thus easily exhausting the pool of available FDs.
  3. Only works on a subset of platforms, thus causing inconsistent UX
  4. You'd still need proper tracking of processes and still won't solve the TOCTOU issue underneath it

@BenBE

You can potentially tag 1000s of processes, thus easily exhausting the pool of available FDs.

Yes. I've thought of this as well. If there are too many processes tagged for killing, then we have fall back to the old way of killing by PID.

Personally I consider this feature request a very low priority. htop works fine without it, but things can work a little safer if it has the feature.

You'd still need proper tracking of processes and still won't solve the TOCTOU issue underneath it

This isn't to solve all TOCTOU issues. (TOCTOU can happen between tagging the processes and pressing the 'k' key.) This is meant to give the user one chance to confirm what they are killing (by holding the FDs for a short time). Before the signals are actually sent, the processes tagged and their FD open can be reviewed again and it is this time interval we block the race in.

@BenBE

You can potentially tag 1000s of processes, thus easily exhausting the pool of available FDs.

Yes. I've thought of this as well. If there are too many processes tagged for killing, then we have fall back to the old way of killing by PID.

Great, now you have double the code …

Personally I consider this feature request a very low priority. htop works fine without it, but things can work a little safer if it has the feature.

IMO the gain is negligible …

You'd still need proper tracking of processes and still won't solve the TOCTOU issue underneath it

This isn't to solve all TOCTOU issues. (TOCTOU can happen between tagging the processes and pressing the 'k' key.)

The other chance (potentially minimized here) is when actually sending the signal.

This is meant to give the user one chance to confirm what they are killing (by holding the FDs for a short time). Before the signals are actually sent, the processes tagged and their FD open can be reviewed again and it is this time interval we block the race in.

Worse than killing the wrong process is being shown wrong information you are confirming.

Like letting you chose an envelope labelled "Lot's of money" when you select it and replace it by one with no cash in it by the time you actually grab it; but ensuring to give you a strong hold on that one …

@BenBE

The other chance (potentially minimized here) is when actually sending the signal.

No. That's what pidfd_send_signal(2) is for. It sends signals by FD that we have control of, not by PID that is susceptible to race.

If it can race, that's an OS bug to report.

[What's] Worse than killing the wrong process is being shown wrong information you are confirming.
Like letting you chose an envelope labelled "Lot's of money" when you select it and replace it by one with no cash in it by the time you actually grab it; but ensuring to give you a strong hold on that one …

I think this problem exists even if we don't hold the envelopes...

There was some discussion about this issue behind the scenes and overall we reached the conclusion, that we don't want to go forward with this. It basically boils down to the reasons I already outlined above, but also that the cost in the codebase compared to the benefit does not add up. There are simply too many edge cases this proposal still leaves open and demonstratively unsolvable cases, that such a feature would be patchwork at best.