Feature suggestion for `trillium`: Stock handler for catching panics and generating server errors

Question

Feature suggestion for `trillium`: Stock handler for catching panics and generating server errors

joshtriplett opened this issue 9 months ago · comments

I'd love to have a stock handler that wraps another handler and catches any panics, turning them into server errors. I'm happy to write and submit one, if this sounds reasonable.

Jacob Rothstein · Answer 1 · Sat Oct 14 2023 03:09:34 GMT+0800 (China Standard Time)

I tidied up an old branch that allows for unwinding, but I'm still unsure if it's a good idea to support this, as it feels like a misuse of panic unwinding. Because the Conn contains the Transport, unwinding a panic into a http error is accomplished with a Drop implementation on trillium::Conn that effectively takes all of the inner state and sends it to a FnOnce that gets attached to the Conn.

If you can find a way to do this without changes to trillium itself, I'd be keen to hear about that, but because the Transport is not necessarily Clone and is owned by the Conn, I think this requires changes to trillium. It looks like it's semver-patch, though

I'll PR this and possibly gate it behind a feature pending further experimentation, as "panic for flow control" seems like a big hack.

Josh Triplett · Answer 2 · Sat Oct 14 2023 13:56:09 GMT+0800 (China Standard Time)

@jbr I'm not looking to do "panic for flow control" here; this is to catch unexpected panics and avoid killing the whole server, and finish any tracing spans.

I think you could simplify this, by not preserving any of the state from the existing Conn, and just making a new Conn with no state to handle the internal server error, or even just sending a hardcoded HTTP response over the Transport. And you could require that the transport implements Clone in order to support this, since many transports will. (That's still not trivial, since Conn boxes the transport. :( ) Then clone the transport before running the original Handler, replace the transport in the original Conn with one that flags (via AtomicBool) if any bytes have been written yet, and iff no bytes have been written when the panic occurs, use the cloned Transport to send the internal server error.

Josh Triplett · Answer 3 · Sat Oct 14 2023 14:36:31 GMT+0800 (China Standard Time)

That said, perhaps trying to return a response from an unexpected panic is unnecessary, and it might suffice to catch panics and close the connection. I may try that for simplicity, if solutions that allow returning a response are too complex.

Josh Triplett · Answer 4 · Sat Oct 14 2023 16:26:41 GMT+0800 (China Standard Time)

Nevermind, I see the problem now: building this as a handler leaves nothing to return as the Conn after catching a panic.

Josh Triplett · Answer 5 · Sat Oct 14 2023 16:37:06 GMT+0800 (China Standard Time)

I would welcome a suggestion for how to make panics not bring down the server, and to be able to finish any open tracing spans (so that they can be sent off for telemetry). It doesn't matter to me if that's a Handler or something else. I'm much less concerned about being able to successfully send an HTTP 500, though that'd be nice.

This is something that was straightforward to do with Tide.

Jacob Rothstein · Answer 6 · Sun Oct 15 2023 00:02:31 GMT+0800 (China Standard Time)

Which runtime are you using? My understanding was that all of the current runtime adapters do what you're suggesting as default behavior; hanging up on the http client but continuing to process requests. If that's not the case, I'd consider that a bug. Sending a 500 would be the new behavior, and I've been convincing myself the drop-channel thing isn't terribly gross since it's a best effort that falls back to the current behavior if drop never runs

Josh Triplett · Answer 7 · Sun Oct 15 2023 19:56:24 GMT+0800 (China Standard Time)

@jbr ...well, that's fascinating. I had been looking for that functionality in trillium, and I had no idea that async runtimes did that themselves.

Given that, it sounds like I've already got the robustness I need, and since the server doesn't exit it'll be able to continue tracing, so I just need to trace panics myself. I don't know if it's worth the complexity just to send a 500.

Thank you for the explanation; TIL.