elixir-plug / plug_cowboy

Plug adapter for the Cowboy web server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Plug.Cowboy.Conn.chunk/2 needs a backpressure mechanism

jvoegele opened this issue · comments

We have a Phoenix application that generates very large API responses (approaching 1Gb in size) in both JSON and CSV formats. To keep memory usage down, we are streaming these large responses back to the client using Plug.Conn.send_chunked and Plug.Conn.chunk, and we "consume" the stream using Enum.reduce_while, as suggested in the documentation for Plug.Conn.chunk.

While this approach has worked fine for API responses that are not extraordinarily large, it has been causing problems for our very large responses. Specifically, when downloading these large API responses (using curl for example), the download will start out reasonably fast, but then after a period of time it will slow down to a crawl. By using the observer tool I was able to pinpoint the problem: the cowboy_clear:connection_process process gets overloaded with millions of messages in its mailbox and also consumes a large amount of memory. When the system gets into this state is when the chunked HTTP response slows down so much that it has virtually stopped altogether.

(Incidentally, aborting the download on the client-side (such as Ctrl-C on curl) will eventually cause the server-side to clean itself up and the memory usage quickly returns to normal levels.)

The root problem appears to be that Plug.Cowboy.Conn.chunk/2 calls :cowboy_req.stream_body/3, which immediately sends a message to its stream handler process. Since there is no backpressure mechanism, this means that the Stream that is being chunked back to the client is fully "realized" almost immediately, so memory usage spikes to the hold the entire stream in memory. It also means that the Cowboy stream handler process gets overloaded with messages in its process mailbox, and this in turn causes the extreme slow down mentioned above.

I cannot share our real codebase that demonstrates this issue, but I have put together a demo project that simulates the scenario and reproduces the behavior.

In this demo project, there is a controller with a flood action that generates a large stream of CSV data and streams it to the client: https://github.com/pro-football-focus/plug_chunk_backpressure_demo/blob/25f93204024e479a965cda61ddb401ff792de8e6/lib/chunk_backpressure_demo_web/controllers/demo_controller.ex#L6

This flood action demonstrates the behavior described above, where the cowboy_clear:connection_process mailbox gets overloaded and the streaming response slows to a crawl.

In that same controller, there is another action called backpressure, which streams the same large CSV response to the client, but in which I've implemented a rudimentary backpressure mechanism: https://github.com/pro-football-focus/plug_chunk_backpressure_demo/blob/25f93204024e479a965cda61ddb401ff792de8e6/lib/chunk_backpressure_demo_web/controllers/demo_controller.ex#L12

The backpressure action checks the size of the process mailbox for the cowboy stream handler, and if it goes over a certain threshold (arbitrarily set at 500 in the demo), then it waits until the mailbox size goes back down below that threshold before sending the chunk to cowboy. (This is done using the wait_for_it library, but that is an incidental detail and not important to demonstrate the issue.)

This rudimentary backpressure mechanism not only keeps the memory usage low and prevents the overload of the stream handler process mailbox, but it also speeds up the response dramatically! The CSV data produced in the demo is 300Mb in size, and downloading it with curl took nearly 6 minutes using the flood action (with no backpressure) but took only 3.5 seconds with the backpressure action:

$ time curl -o flood.csv http://localhost:4000/flood              
curl -o flood.csv http://localhost:4000/flood  2.88s user 27.72s system 8% cpu 5:50.33 total

$ time curl -o backpressure.csv http://localhost:4000/backpressure 
curl -o backpressure.csv http://localhost:4000/backpressure  0.34s user 2.95s system 93% cpu 3.518 total

I think this convincingly demonstrates the need for a good backpressure mechanism in Plug.Cowboy.Conn.chunk/2.

I don't intend to suggest that inspecting the process mailbox size like I've done in the demo should be the basis for such a backpressure mechanism, though. In fact, it is quite possible that there is something like a quantum "observer effect" happening here: the very act of checking the size of the mailbox slows things down enough to "accidentally" provide backpressure. That said, I'm hoping that someone on the plug_cowboy team can come up with a production-ready backpressure mechanism that allows for streaming large responses efficiently.

From an initial look, my suggestion would be to use some sort of ping operation on the cowboy stream process. For example, we could chunk 5 entries, send a ping to cowboy, chunk 5 more, and then wait for a pong. But this would have to be coordinated with Cowboy and added there first. :) Thoughts?

Yeah, I think that makes sense. I'll create an issue report on the Cowboy GitHub and link back here.

Sounds like cowboy have implemented something to facilitate this downstream!

Good news! @jvoegele, if you do get it working, can you please post a small write up here? Thank you!

Updating the cowboy dependency to the latest master seems to have fixed this problem for me. I didn't even need to set the max_stream_buffer_size option, so I would guess that it is set to some reasonable default value.

As far as I'm concerned, this issue can be closed once plug_cowboy is updated to depend on the fixed version of cowboy.

Thanks!

Thanks @jvoegele! If it just works on new Cowboy versions, then I will go ahead and close this. People can always upgrade cowboy on their own and I wouldn't necessarily want to force everyone to upgrade, especially, if they are not necesssarily in need for this fix.