Memory usage: Daphne loading all the file in memory (POST request)

Question

Memory usage: Daphne loading all the file in memory (POST request)

cpina opened this issue 10 months ago · comments

Carles Pina Estany commented 10 months ago

I am using Debian packaged versions (Debian 12 Bullseye) using:

daphne 4.0.0
django 3.2.19
python 3.11.2

The problem happens when using Django with daphne (debugging with runserver or in production with Nginx in front of it).

For testing purposes I have a form in the application accepting a 4 GB file (with Nginx accepting this file size). This is to make it more visible.

When the request makes it to daphne in (Twisted and Python's cgi.py are not reading it into memory for what I can see, they use a temporary file or passing it into daphne without full read):
https://github.com/django/daphne/blob/4.0.0/daphne/http_protocol.py#L188

Daphne keeps reading the 4 GB and adding it to the queue in 8 KB chunks (the queue was created without any max_size).

If I use uvicorn (or gunicorn with uvicorn workers) the problem does not happen: no memory change POSTing a 4 GB file. If I use runserver without Daphne's application it does not happen either.

Carlton Gibson · Answer 1 · Wed Sep 27 2023 00:08:20 GMT+0800 (China Standard Time)

This has been this way since day 1. Can you find the corresponding bit in unicorn? (The protocol need to pass the more_body flag is the key here...)

Carles Pina Estany · Answer 2 · Wed Sep 27 2023 06:12:25 GMT+0800 (China Standard Time)

Sure!

I'll add links to the places. I (think) I see the problem (but not the solution in Daphne). Hopefully this will help to at least add some breakpoints in some places and POST a file :-)

If using daphne:

In ASGIHandler.read_body() (https://github.com/django/django/blob/stable/3.2.x/django/core/handlers/asgi.py#L175) receive() calls asyncio.Queue.get() . In my original comment I linked a few lines before the self.application_queue.put_nowait() (see https://github.com/django/daphne/blob/4.0.0/daphne/http_protocol.py#L196). So daphne seems to read all the body (in chunks) and add it all into the queue in chunks. Then, if using daphne, read_body() dequeues it from memory.

If using uvicorn:

In ASGIHandler.read_body() (https://github.com/django/django/blob/stable/3.2.x/django/core/handlers/asgi.py#L175) receive() calls RequestResponseCycle.receive() (https://github.com/encode/uvicorn/blob/0.17.0/uvicorn/protocols/http/h11_impl.py#L492)
.
I haven't properly understood this yet but on each call of RequestResponseCycle.receive() there is also a call to https://github.com/encode/uvicorn/blob/0.17.0/uvicorn/protocols/http/h11_impl.py#L127 (H11Protocol.data_received(), note that H11Protocol instantiates RequestResponseCycle) (this is done via a run_forever and actually comes from asyncio/selector_events/_SelectorSocketTransport._read_ready(). So, it seems that data is read under demand from Django as it keeps arriving.

I don't think that I have enough bandwidth at the moment to get enough familiar with Daphne code and fix it properly (unless I'm wrong this seems that might need quite lots of Daphne code changes? Do you think so?). Am I missing something obvious in Daphne that could provide a fix?

What I could maybe do the next days / weekend? is to write a very simple Django app (perhaps inspired by https://adamj.eu/tech/2020/10/15/a-single-file-rest-api-in-django/) that help reproducing the problem, if this would help.

Carlton Gibson · Answer 3 · Wed Sep 27 2023 14:43:01 GMT+0800 (China Standard Time)

What I'm not clear on is how the protocol server is meant to pass the file to the application without reading it. Both have to do that it seems to me. (Make sure that you have Django set to spool to disk, but I assume you're using the same Django settings for both servers.)

Happy to look at what you discover.

Carlton Gibson · Answer 4 · Wed Sep 27 2023 14:43:47 GMT+0800 (China Standard Time)

Also I'd update Django.

Carles Pina Estany · Answer 5 · Wed Sep 27 2023 15:21:56 GMT+0800 (China Standard Time)

What I'm not clear on is how the protocol server is meant to pass the file to the application without reading it. Both have to do that it seems to me. (Make sure that you have Django set to spool to disk, but I assume you're using the same Django settings for both servers.)

I haven't understood well the code (I will try next days/weeks, I need to do some other things first). I think that last night I saw that uvicorn uses some async methods so it reads only what is passed to the application instead of reading everything.

When I find more (or better) findings I'll write them here.

Also I'd update Django.

For what I saw: I'm pretty sure that daphne code reads everything (holds in memory) before Django is involved. Then Django process it.

Same Django settings in both cases (just launching Django differently).

Benjamin XT · Answer 6 · Tue Apr 30 2024 17:32:54 GMT+0800 (China Standard Time)

@carltongibson I meet the same problem,I am not good at English, but I hope I can describe the problem clearly.

In "http_protocol.py", "daphne" init "http.Request" in Class "WebRequest",the "http.Request" will call "cgi.parse_multipart" method,this will load whole file in memory, and the "args" seems not used, "ASGIHandler" will read the "content" again and parse the 'body'. so, think about overwrite the behavior of 'http.Request' in Class "WebRequest".

Carlton Gibson · Answer 7 · Tue Apr 30 2024 17:39:06 GMT+0800 (China Standard Time)

As per this comment on the asgiref repo, I think the behaviour here is just required by the spec.

django/asgiref#66 (comment)