elixir-plug / plug_cowboy

Plug adapter for the Cowboy web server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Body streaming can hang due to cowboy backpressure

gfodor opened this issue · comments

Hi there, we are running into a problem where the call to chunk/2 is hanging. I've traced it down to the following. Recently, cowboy2 added a backpressure mechanism for streaming content: ninenines/cowboy#1376

This new mechanism sends ack messages back to the caller process in order to provide the backpressure. As noted in the comments here (added as part of ninenines/cowboy@eaa0526) , it is no longer possible to send streaming data to cowboy2's handler from multiple processes:

https://github.com/ninenines/cowboy/blob/master/src/cowboy_stream_h.erl#L223

However, when we have a simple HTTP controller in phoenix which streams data it appears that it (unsurprisingly) sends to the same HTTP/2 cowboy handler from multiple request handle PIDs when requests come in concurrently through the same HTTP/2 transport. This breaks the contract, and is resulting in the backpressure mechanism hanging when enough data is pushed at once.

The temporary workaround (which may still fail I think) is to increase the new buffer size to a sufficient amount so all data is ack'ed immediately. I'm not sure how to resolve this properly, it seems like the cowboy plug would need to start multiplexing streamed data somehow through a new process 😬

Hi @gfodor! 👋 My understanding is that it should be able to stream from multiple requests multiplexed over the same connection. However, for a given request, you cannot stream from multiple processes. For example, you wouldn't be able to break the streaming from a request between tasks or multiple processes. Is this the case you are running into?

Hey @josevalim ! Thanks for the quick response. Ah, so I think if what you are saying is correct, these are distinct requests being streamed separately. So perhaps my understanding of why the data_ack is missing is wrong. However, it does seem that chunk should never hang (obviously) so there's a bug somewhere. The code is here:

https://github.com/mozilla/reticulum/blob/master/lib/ret_web/controllers/file_controller.ex#L174

(Note that I think the use of .map is wrong here, I've updated it to use each and the bug persists.)

If I use HTTP/1.1, no hang. If I stream requests one at a time, no hang. If I stream multiple concurrent but small requests then (as best as I can tell) no hang. If I increase the buffer size enough to fit the full body, no hang. If I make multiple concurrent requests and each streams a large enough response, it hangs. I confirmed my stream handler/transform isn't hanging, its the call to chunk, presumably just waiting indefinitely on the receive.

Also, just to confirm, if I spit out the pid of the current request at the time I call chunk, it obviously varies across requests. I guess the bit I'm not sure of is if those requests are sending to the same underlying cowboy stream process or not. I don't know enough of the process architecture of how cowboy/phoenix to say offhand, but just to be clear my expectation is that if those request processes are infact sending data to the same underlying cowboy process then it is not expected to work.

Added some logging, and it looks to me like each independent request is in fact sending data to the same underlying cowboy PID via stream_body (if you unpack the cowboy_req). So my original assessment seems correct I think.

I believe I found it to be a Cowboy bug. Report with a possible fix done here: ninenines/cowboy#1460 - Closing in favor of that discussion. :)