Documentation for server.Policy does not agree with implementation

Question

Documentation for server.Policy does not agree with implementation

zenhack opened this issue 3 years ago · comments

The documentation for server.Policy indicates that when the limits are exceeded calls will immediately return exceptions. However, when pairing to debug #189, @lthibault and I discovered this is not actually what happens. Instead, the receive loop just blocks until an item in the queue clears.

However, I do not actually think what the documentation describes is what we want; this would mean lost messages if the sender is sending too fast, and it seems like it would be challenging to program around. We really want some kind of backpressure like that provided by the C++ implementation's streaming support.

I propose the following roadmap for dealing with memory/backpressure instead:

Short term, fix the docs to describe what the code actually does.
Slightly longer term, get rid of the bounded queue and replace it with an unbounded one.
Remove the server.Policy type entirely.
Implement something like the C++ implementation's per connection flow limit, for some soft backpressure at the connection level. Probably as a field on rpc.Options
Implement per-object flow control as is done in the C++ implementation for streaming specifically. I think we can do this for all clients irrespective of the streaming annotation, without changing any APIs; we just need to block the caller in appropriate places.
In rpc.Options, add a field for a hard limit on memory consumed by outstanding rpc messages (of all types, including returns), after which the connection will just be dropped.

Thoughts?

@zombiezen, particularly since I'm proposing removing something you added for v3, I would appreciate your thoughts.

Roxy Light · Answer 1 · Mon Dec 13 2021 12:43:05 GMT+0800 (China Standard Time)

The documentation for server.Policy indicates that when the limits are exceeded calls will immediately return exceptions. [...] Instead, the receive loop just blocks until an item in the queue clears.

Yeah, looks like I introduced that in 27748fa without updating docs. Sorry.

However, I do not actually think what the documentation describes is what we want; this would mean lost messages if the sender is sending too fast, and it seems like it would be challenging to program around. We really want some kind of backpressure like that provided by the C++ implementation's streaming support.

I agree, server.Policy seems like a kludge and it would be a better developer experience to have backpressure than to have a capability reject method calls. My memory is hazy here on details, but it seems like I might have realized the mistake and was trying to move more toward backpressure.

I did spill a lot of words at the time I was crunching through this that might help explain: https://groups.google.com/g/capnproto/c/A4LiXXd1t94/m/GkFH1AK-AgAJ

The fundamental insight that I have is that Cap'n Proto RPC is a work plan stream across all capabilities. You necessarily need to queue under the assumption traffic will be bursty, but there are situations that can deadlock if not enough work resolves in time.

IIUC, if there's protocol-level support for applying backpressure to individual objects while preserving E-Order, that would be better than the state of the world when I was trying to hack in server.Policy.

An unbounded queue sounds potentially dangerous, though, but I think you're addressing that with the rpc.Options memory limit? That does match with my insight above.

Ian Denhardt · Answer 2 · Mon Dec 13 2021 13:24:10 GMT+0800 (China Standard Time)

An unbounded queue sounds potentially dangerous, though, but I think you're addressing that with the rpc.Options memory limit? That does match with my insight above.

Right, the idea is that rather than relying on a queue bound for memory limiting, we'd track it manually. But we would expect under normal circumstances we would never reach the memory limit; it's not for backpressure, it's a stopgap for abuse, and we just drop the connection if it is breached. We rely on cooperative mechanisms for "normal" backpressure. See also #198, where I started working on per-object backpressure.

I vaguely remember the mailing list thread. See also https://groups.google.com/g/capnproto/c/wbGvhHaBan4 which from a couple weeks ago.

Roxy Light · Answer 3 · Tue Dec 14 2021 11:23:48 GMT+0800 (China Standard Time)

Cool. Sounds like the right direction; let me know if there's anything else I can help with.

Ian Denhardt · Answer 4 · Sat Aug 27 2022 12:11:13 GMT+0800 (China Standard Time)

Everything on my roadmap above is done except for (1) the connection-level limits and (2) adaptive flow control (which to be fair the C++ implementation doesn't really have anyway; we already have a non-adaptive version). (2) is in progress as #294. I'm going to open a separate issue for the connection limits and close this one.