django / daphne

Django Channels HTTP/WebSocket server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory usage: Daphne loading all the file in memory (POST request)

cpina opened this issue · comments

I am using Debian packaged versions (Debian 12 Bullseye) using:

  • daphne 4.0.0
  • django 3.2.19
  • python 3.11.2

The problem happens when using Django with daphne (debugging with runserver or in production with Nginx in front of it).

For testing purposes I have a form in the application accepting a 4 GB file (with Nginx accepting this file size). This is to make it more visible.

When the request makes it to daphne in (Twisted and Python's cgi.py are not reading it into memory for what I can see, they use a temporary file or passing it into daphne without full read):
https://github.com/django/daphne/blob/4.0.0/daphne/http_protocol.py#L188

Daphne keeps reading the 4 GB and adding it to the queue in 8 KB chunks (the queue was created without any max_size).

If I use uvicorn (or gunicorn with uvicorn workers) the problem does not happen: no memory change POSTing a 4 GB file. If I use runserver without Daphne's application it does not happen either.

This has been this way since day 1. Can you find the corresponding bit in unicorn? (The protocol need to pass the more_body flag is the key here...)

Sure!

I'll add links to the places. I (think) I see the problem (but not the solution in Daphne). Hopefully this will help to at least add some breakpoints in some places and POST a file :-)

If using daphne:

In ASGIHandler.read_body() (https://github.com/django/django/blob/stable/3.2.x/django/core/handlers/asgi.py#L175) receive() calls asyncio.Queue.get() . In my original comment I linked a few lines before the self.application_queue.put_nowait() (see https://github.com/django/daphne/blob/4.0.0/daphne/http_protocol.py#L196). So daphne seems to read all the body (in chunks) and add it all into the queue in chunks. Then, if using daphne, read_body() dequeues it from memory.

If using uvicorn:

In ASGIHandler.read_body() (https://github.com/django/django/blob/stable/3.2.x/django/core/handlers/asgi.py#L175) receive() calls RequestResponseCycle.receive() (https://github.com/encode/uvicorn/blob/0.17.0/uvicorn/protocols/http/h11_impl.py#L492)
.
I haven't properly understood this yet but on each call of RequestResponseCycle.receive() there is also a call to https://github.com/encode/uvicorn/blob/0.17.0/uvicorn/protocols/http/h11_impl.py#L127 (H11Protocol.data_received(), note that H11Protocol instantiates RequestResponseCycle) (this is done via a run_forever and actually comes from asyncio/selector_events/_SelectorSocketTransport._read_ready(). So, it seems that data is read under demand from Django as it keeps arriving.

I don't think that I have enough bandwidth at the moment to get enough familiar with Daphne code and fix it properly (unless I'm wrong this seems that might need quite lots of Daphne code changes? Do you think so?). Am I missing something obvious in Daphne that could provide a fix?

What I could maybe do the next days / weekend? is to write a very simple Django app (perhaps inspired by https://adamj.eu/tech/2020/10/15/a-single-file-rest-api-in-django/) that help reproducing the problem, if this would help.

What I'm not clear on is how the protocol server is meant to pass the file to the application without reading it. Both have to do that it seems to me. (Make sure that you have Django set to spool to disk, but I assume you're using the same Django settings for both servers.)

Happy to look at what you discover.

Also I'd update Django.

What I'm not clear on is how the protocol server is meant to pass the file to the application without reading it. Both have to do that it seems to me. (Make sure that you have Django set to spool to disk, but I assume you're using the same Django settings for both servers.)

I haven't understood well the code (I will try next days/weeks, I need to do some other things first). I think that last night I saw that uvicorn uses some async methods so it reads only what is passed to the application instead of reading everything.

When I find more (or better) findings I'll write them here.

Also I'd update Django.

For what I saw: I'm pretty sure that daphne code reads everything (holds in memory) before Django is involved. Then Django process it.

Same Django settings in both cases (just launching Django differently).