Utility to chain signal handlers

Question

Utility to chain signal handlers

Diggsey opened this issue 4 years ago · comments

It would make more sense to treat signal handlers as a stack.

When you register a signal handler, it gets pushed onto the top of the stack. When you unregister a signal handler, it is popped from the stack.

When a signal is raised, the signal handler at the top of the stack is called. That signal handler can return true or false to indicate whether the signal was handled. If the signal was not handled, it is delegated to the next handler in the stack. At the bottom of the stack is the default signal handler.

This is how I've seen other pograms use signal handlers, and it is how similar mechanisms work elsewhere, eg. https://docs.microsoft.com/en-us/windows/console/setconsolectrlhandler

I saw in another issue you had concerns with how this crate would be able to unregister its own signal handler, since it wouldn't know which handler to restore. I think that's not necessary: this handler can just always defer to the previous C handler if there are no handlers registered through this crate.

This also solves the problem with restoring the default handler: there is no need, since the default handler will be automatically called if no handler is present to intercept the signal.

Michal 'vorner' Vaner · Answer 1 · Tue Dec 29 2020 18:42:29 GMT+0800 (China Standard Time)

Hello

I'm not sure I understand your motivation or what exactly you want to do here and how it fits into what signal-hook is supposed to be. But I'll try two guesses. If I'm wrong, please explain what you mean, maybe with an example situation or specific use case you're solving and where signal-hook doesn't work for you :-)

Using the stack as an internal backend mechanism

This is already being done, in a form. The previous signal is chained and called after us and we kind of assume whoever registers after us would do the same. This is a mechanism to live in the same program with other libraries that play with signals and not kill each other.
https://github.com/vorner/signal-hook/blob/master/signal-hook-registry/src/lib.rs#L158
https://github.com/vorner/signal-hook/blob/master/signal-hook-registry/src/lib.rs#L234

However, the „return a flag if it was handled“ is not present, for three reasons:

It would have to be a generally accepted and used pattern, which doesn't seem to be the case in the wild.
It is not true that a signal must always be fully handled by one handler and others are not interested in them. One can have an application where the SIGHUP (default action being terminate) should log that the signal was received, reopen log files and reload configuration files (and not terminate). And each action can be done by a different library, not knowing about each other. So how should they know they should return „signal handled“? The last one in the chain should, but as they don't know about each other, they have no way knowing they are the last. It seems safer from this perspective if everyone who registered a handler gets notified.
We do not remove the backend signal handlers. The issue is, we have no reliable and race-free way to know that we are on the top of the stack and manipulating the raw handlers is racy in multithreaded applications (therefore one better just registers everything upfront and then tweak what happens inside).

Furthermore, one can't put the default handler onto a stack or call it because the default handler is not really a function. SIG_DFL is a numeric constant (1, I think) that masks itself as a function pointer. You can have no handler (SIG_IGN), the default one (SIG_DFL) or your own function, but they are exclusive. The best you can do is „emulate“ the default handler by invoking the same thing, depending on the signal number.

Exposing such API to users

I believe such approach would go against the goals of the library, in particular these two:

Signals should be turned into sane-acting local resource, similar to allocating memory or spawning a thread. There should be no top-level oversight from the end application author needed. If a library spawns a thread, this should have no effect on other libraries doing the same. If a library wants to listen to a signal, doing so should not in any way disrupt other libraries or parts of application that do that. By allowing one handler to consume the signal for others and cut them off from the information seems in conflict with it, see the above example with logging/reloading/reopening. The approach „everyone who expressed an interest gets notified“ seems more closer to the idea of the subscriptions being independent. Furthermore, the flag „I've handled it“ approach is much more prone to unwanted breakage due to reordering the registration, and considering I've seen libraries register eg. SIGCHLD lazily on demand, that could blow up. The current approach seems a bit less powerful but also less inclined to break in hard to debug ways.
Users should be able to do common things without using unsafe code. But the content of the actual signal handler must be async-signal-safe, which means no allocations, no locking, and many more „no“-something, and violating that is UB. Therefore, any custom user code that goes in there must be registered with unsafe, to ensure users properly checked that code. The way the library goes about this problem is to provide abstractions that are already checked and postpone the handling of the signal to some later time, when the user picks them up (by reading a flag or waiting on an iterator in a designated thread, for example). But to decide if one shall call more signal handlers, it would need to provide the answer right away inside the signal handler, which would require user code being run as part of the handler. That would, in turn, require each user to use unsafe.

Being able to remove the handlers only in reverse order of their registration seems very limiting, considering the goal is for libraries to be able to register and unregister their interest without any interaction with other libraries doing so. Currently unregistering can be done in arbitrary order.

Furthermore, such a big change seems to be in magnitude of „throw it all out and start from scratch“ and then there would be the question why it should be done inside the same library and not a separate one. In particular, this would be a big breaking change to signal-hook-registry and the idea in that is to provide a common meeting ground that will never have a breaking release.

Why not go back to default signal handler

There are two reasons why signal hook doesn't want to ever go back to the default signal handler (and recommends the user to emulate the behaviour if desired).

As noted above, default handler is all or nothing. To revert to it, one needs a willing cooperation not only of signal-hook enabled libraries, but of anyone who registered a signal handler directly too. We know nothing about them, or how to talk to them and cutting them from the signal notifications they subscribed to seems rude to say the least.
Such behaviour would again be very fragile. Imagine that SIGHUP situation again. Let's say we have the configuration reloading library and the logging library that reopens the logs. But what if the configuration library can read part of the configuration that says „no more reloading, freeze the config“. Then it would unregister its handler. Now, if we reconfigure the logging library to log to a network socket, it would not register its handler either. But then, the act of unregistering the config's handler would either only stop reloading configuration, or additionally restore the default handler to terminate the application and there's no way to know which locally.

(The above talks about rolling back to default ones implicitly under some conditions; I've thought about being able to do it explicitly, pausing all signals in the application for a while and then being able to return them back, but that sounded fragile too, considering there's the option to just emulate them)

Is this library the right place?

I slightly suspect that what you're trying to answer here is „how does an application decide what to do with a signal“. This assumes that it's the application deciding somewhere on the top level (which is traditionally the necessity with signals because of their limitations, but signal-hook tries to challenge that dogma).

Signal-hook, on the other hand, tries to answer „How does multiple independent parts of a program not kill each other when dealing with signals“ (think something like a global memory allocator, instead each part of application playing with the brk syscall) and „How do I safely get the signal out of the signal hander to a place where I can actually do something about it comfortably“.

If I'm right in this guess, I believe the right approach would be not to modify signal-hook, but to add an additional layer, the distribution stack. That one would register into signal-hook somehow (using one of the abstractions or maybe go directly to signal-hook-registry) and let the application add or remove these handlers. Any independent library would still get its copy of the notification, because there's nothing known about that library, or if you could ensure there's nobody else dealing with signals, it would act exactly as you describe.

I'm not sure this belongs inside signal-hook, or as a separate library (I could imagine both, depending on how it turns out, in the latter case it would make sense to link to the library from signal-hook's docs). But if you think this is something worth exploring, I'm open to discussing how it would look like API wise.

Diggory Blake · Answer 2 · Tue Dec 29 2020 22:51:15 GMT+0800 (China Standard Time)

It is not true that a signal must always be fully handled by one handler and others are not interested in them. One can have an application where the SIGHUP (default action being terminate) should log that the signal was received, reopen log files and reload configuration files (and not terminate). And each action can be done by a different library, not knowing about each other. So how should they know they should return „signal handled“? The last one in the chain should, but as they don't know about each other, they have no way knowing they are the last. It seems safer from this perspective if everyone who registered a handler gets notified.

Even in this scenario, my proposed behaviour makes more sense: just because a library listens for SIGHUP shouldn't change the behaviour of the application. The library should install a signal handler that does its thing and then always returns false (delegating to the next handler).

If the application wants to avoid terminating in response to SIGHUP, then it should install its own signal handler at the very start which just returns true. The library has no idea whether suppressing SIGHUP termination is the right thing to do for the whole application.

Furthermore, one can't put the default handler onto a stack or call it because the default handler is not really a function. SIG_DFL is a numeric constant (1, I think) that masks itself as a function pointer. You can have no handler (SIG_IGN), the default one (SIG_DFL) or your own function, but they are exclusive. The best you can do is „emulate“ the default handler by invoking the same thing, depending on the signal number.

Yeah you have to just re-raise the signal if the previous signal handler was SIG_DFL.

Currently unregistering can be done in arbitrary order.

You don't have to do this: the library would support unregistering in any order, it's just following the stack model would be more usual.

If I'm right in this guess, I believe the right approach would be not to modify signal-hook, but to add an additional layer, the distribution stack.

Fair enough. The main issue I have with signal-hook is that it changes the default behaviour even when no signal handlers are attached. If I attach a SIGTERM handler via signal-hook and then remove it, the SIGTERM is forever blocked. That's absolutely not what I want.

The stack model naturally solves this because if no handler explicitly intercepts the signal then the default will always be called. When signal-hook is in the same program I can't rely on that, even when using the stack model myself.

Signal-hook, on the other hand, tries to answer „How does multiple independent parts of a program not kill each other when dealing with signals“ (think something like a global memory allocator, instead each part of application playing with the brk syscall) and „How do I safely get the signal out of the signal hander to a place where I can actually do something about it comfortably“.

Fair enough: I think this can work for listening to signals, but suppressing the default behaviour of signals is fundamentally a global behaviour, and cannot be managed independently.

Also: signal-hook will call the previous hook, but only if it's not the default, so replacing the default hook with one that does the same thing changes the behaviour of the program.

I've thought about being able to do it explicitly, pausing all signals in the application for a while and then being able to return them back, but that sounded fragile too, considering there's the option to just emulate them

This would make more sense if you want to stick with the model of "all the handlers get called". Then suppressing the default/previous handler would be a separate call.

Michal 'vorner' Vaner · Answer 3 · Wed Dec 30 2020 02:43:33 GMT+0800 (China Standard Time)

If the application wants to avoid terminating in response to SIGHUP, then it should install its own signal handler at the very start which just returns true. The library has no idea whether suppressing SIGHUP termination is the right thing to do for the whole application.

Fair enough. The main issue I have with signal-hook is that it changes the default behaviour even when no signal handlers are attached. If I attach a SIGTERM handler via signal-hook and then remove it, the SIGTERM is forever blocked. That's absolutely not what I want.

You have a point there. The removal of the default is a bit of an exception in how it behaves now. The way I've always looked at the existence of default handlers is they are kind of exception/weird and exist only for applications that don't want to touch signal handling. Once one starts doing something about them, they should be taken over completely. But you're right that the current implicit takeover has its set of quirks.

I still don't like the full stack model, for the reasons I've mentioned, I'd prefer to treat all the hooks equally and the need of the application code to be inside the low-level hook (therefore unsafe for everyone) feels as a show stopper. And there's backwards compatibility to consider.

So let's come up with some way that would allow you to build what you want. I'll throw some ideas here and no, they are not final:

Explicit signal takeover

Currently, the signal is taken over implicitly, on first use. We could add some kind of init_signal call. That would explicitly install signal-hooks signal handler. It would also allow for configuration about how it behaves ‒ if it should call into the previous signal handler, if it should suppress the default handler or emulate it, ... Not sure how exactly the configuration would look like or what it would allow.

The signal_hook_registry::register would get deprecated (can't really change that behaviour, that would be breaking) and a different one would be added. That one would error out if called before initialization, making sure application decides what happens to that specific signal before libraries are allowed to play with it. The signal-hook built on top of it would migrate to the new (errorring) version, that one is allowed to have breaking changes.

Emulation of the default handler

Currently, there are docs, but maybe a function default_handler(c_int) that would call whatever equivalence of the signal might be useful (re-raising the signal doesn't work, it would simply receive the signal again).

The stack

Is there a reason why that manual stack thing couldn't contain the default handler emulation on the bottom of the stack? How much does that differ from what you have in mind?

Current options

Without any changes to signal-hook, I think you could register the conditional shutdown and control if its active by a flag. Not exactly what you want, but something that's already there.

Diggory Blake · Answer 4 · Wed Dec 30 2020 03:21:13 GMT+0800 (China Standard Time)

Explicit signal takeover

Yeah, explicit initializing would solve the problem I think. You'd have to make it an error to call it more than once. The stack model is simply a way to assign a sensible behaviour to calling this multiple times, but it's not necessary.

Emulation of the default handler

Yeah I looked into this a fair amount, and it's annoyingly difficult. These are the options I have:

Re-implement what we think the default behaviour is. eg. explicitly terminate the program for SIGTERM, etc.
Reset the signal handler and re-raise, then restore the prior signal handler. Accept that there's a race condition if the same signal is raised on a different thread.
Reset the signal handler and re-raise, then restore the prior signal handler. Prevent the race condition using a spin-lock per signal. (The signal that was raised should be automatically masked on the same thread, so we shouldn't deadlock, but it's kinda sketchy, and you might deadlock if the same 2 signals occur in different orders on two threads?)
Reset the signal handler and re-raise, then restore the prior signal handler. Prevent the race condition using a single spin-lock, and configure out handler to mask out all other signals whilst it is executing.

Is there a reason why that manual stack thing couldn't contain the default handler emulation on the bottom of the stack? How much does that differ from what you have in mind?

No: this would be an implementation detail, completely indistinguishable to the user. The default handler should effectively be the bottom of the stack, so there's no special case to consider of an empty stack.

Michal 'vorner' Vaner · Answer 5 · Wed Dec 30 2020 16:13:45 GMT+0800 (China Standard Time)

You'd have to make it an error to call it more than once.

Either that, or make the call idempotent (at least as far as the replacement of the low-level signal handler goes, it would be possible to still change some of the internal settings by further calls). But I think making it an error is simpler API.

Anyway, while thinking about it, I worry about one thing. Making it explicit for something like SIGTERM makes sense, but what about signals that have a default handler that do nothing, in partigular SIGCHLD? I know at least 2 async runtimes now use signal hook behind the scene to handle SIGCHLD for getting notified about asynchronously running processes and that is supposed to „just work“. I think nobody would be happier about having to allow the takeover of SIGCHLD.

So what I'm thinking about, it should still allow for the configuration, but in case of the ones that don't do anything by default, implicit initialization should still stay a possibility.

Yeah I looked into this a fair amount, and it's annoyingly difficult. These are the options I have:

I'm inclined to 1. in here. Looking through documentation, POSIX says what the default handlers should do and there are only like 3 or 4 options. Furthermore, if there's a bug, someone will notice fast and report it.

All the others are prone to having some subtle bugs that happen once in a blue moon, and I don't particularly like spin locks that could happen out of the blue inside any part of the program (though people would do better if they masked the signals inside their realtime threads anyway).

The default handler should effectively be the bottom of the stack, so there's no special case to consider of an empty stack.

I was just checking this is implementable, not if the default handler should or should not be visible to users.

Anyway, I was wondering if this could be created in more generic way. Some way o callback chain stack data structure might be useful for other purposes, not just signals, with interface something like:

enum ContinuationOp<R> { Stop<R>, Continue };
struct DispatchStack<P, R> { .. }

impl DispatchStack<P> {
  fn push(&mut self, cback: Box<FnMut(P) -> R>);
  fn pop(&mut self);
  fn dispatch(&mut self, param: P) -> Option<R>);
}

Diggory Blake · Answer 6 · Wed Dec 30 2020 22:40:16 GMT+0800 (China Standard Time)

I'm inclined to 1. in here. Looking through documentation, POSIX says what the default handlers should do and there are only like 3 or 4 options. Furthermore, if there's a bug, someone will notice fast and report it.

AIUI, there are 4 options:

Do nothing
Stop
Terminate
Terminate with crash dump

1 + 3 are easy enough to implement. The others I'm unsure how you'd implement.

This is an extract from my attempt to implement a stack-based signal model:

        unsafe fn delegate(&self, signum: libc::c_int, data: Self::Data) {
            if self.0.sa_sigaction == libc::SIG_DFL {
                // Default handler. We want to re-raise the signal, but doing so is racy,
                // so avoid doing it if the default action is to do nothing anyway. If the
                // default action is to terminate the process, then it doesn't matter if
                // restoring the original handler is racy since we won't get to that part...
                //
                // If the default action is to pause the process, then the race condition may
                // be a problem when the process is resumed. There's not a huge amount we can
                // do about that though...
                match signum {
                    libc::SIGCHLD | libc::SIGCONT | libc::SIGURG | libc::SIGWINCH => {}
                    _ => {
                        let prev = self.install(signum);
                        libc::raise(signum);
                        if prev.install(signum).0.sa_sigaction != self.0.sa_sigaction {
                            // Uh oh... Race condition! Just set our signal handler again.
                            Self::ours().install(signum);
                        }
                    }
                }
            } else if self.0.sa_sigaction != libc::SIG_IGN {
                // Non-default handler, call directly
                if self.0.sa_flags & libc::SA_SIGINFO != 0 {
                    mem::transmute::<_, SigActionPtr>(self.0.sa_sigaction)(signum, data.0, data.1);
                } else {
                    mem::transmute::<_, SigHandlerPtr>(self.0.sa_sigaction)(signum);
                }
            }
        }

Michal 'vorner' Vaner · Answer 7 · Wed Dec 30 2020 23:08:39 GMT+0800 (China Standard Time)

For 2, I raise SIGSTOP. That one is one of the unchangeable signals, so it's guaranteed to be at the default.

For 3, calling _exit works (one needs to call the one with underscore, to avoid calling the at-exit hooks from inside the signal handler).

For 4, calling libc::abort should work ‒ that one tries really hard to terminate the program even if SIGABRT is changed (but if the intention is to terminate, it could be reset beforehand anyway).

I think I'll split this issue into the two parts, one for the signal initialization, another for that stack (though I'm still unsure if it should be in here).

Michal 'vorner' Vaner · Answer 8 · Sun Jan 03 2021 18:23:35 GMT+0800 (China Standard Time)

I've extracted the emulation and the initialization to #81 and #82 respectively. I'm leaving this open to hold the idea of the chain-stack thing one could feed signals into and get them distributed.

Diggory Blake · Answer 9 · Sun Jan 03 2021 22:39:28 GMT+0800 (China Standard Time)

I created crates signal-stack and grace for chaining signals and portably handling graceful shutdown respectively. The former should work compatibly with signal-hook. The latter exists because I wanted to avoid using signals entirely on windows, and it provides a safe interface.