Django Channels Memory Leak on every message or connection

Question

Django Channels Memory Leak on every message or connection

cacosandon opened this issue a month ago · comments

Joaquín Ossandon commented a month ago

I'm having a memory leak in Django Channels using uvicorn.

Every "memory crash" is a restart/deploy 👇

This not just happens within my project, but also with the tutorial basic chat example.

Here is the repository with that minimal example and memory profiling: https://github.com/cacosandon/django-channels-memory-leak

This happens locally, in the server, with/without DEBUG, just by reconnecting or sending messages (in the example I've added large messages so you can notice the memory leak).

The memory is never released.. even if the user disconnects after.

I've proved it with memory-profiler and memray (both commands were added in the README so you can reproduce)

Dependencies:

Django==5.0.4
channels==4.0.0
channels-redis==4.2.0
uvicorn[standard]==0.29.0

# Profiling memory
memory-profiler==0.61.0
memray==1.12.0

I (think that) have really tried everything; deleting objects, manual garbage collection, etc. Nothing prevents the memory to increase and to never be released back. Any insights? 🙏

Carlton Gibson · Answer 1 · Wed May 01 2024 01:50:54 GMT+0800 (China Standard Time)

Does the same thing happen with other protocol servers, such as hypercorn and Daphne?

Joaquín Ossandon · Answer 2 · Wed May 01 2024 04:08:24 GMT+0800 (China Standard Time)

I've tested daphne and hypercorn alongside uvicorn. All three show a similar pattern of memory usage, increasing steadily up to around 160 MiB. Despite this, they continue to consume more memory indefinitely, as monitored by memory-profiler.

The interesting thing is, while uvicorn shows a continuous rise in memory usage on the memray graph, the graphs for daphne and hypercorn are flat at 80 MiB. This discrepancy makes it unclear which tool provides more reliable data.

Here are the commands I used for each:

Uvicorn: memray run --force -o output.bin -m uvicorn core.asgi:application
Daphne: python -m memray run -o output.bin --force ./manage.py runserver
Hypercorn: memray run --force -o output.bin -m hypercorn core.asgi:application

Carlton Gibson · Answer 3 · Wed May 01 2024 13:47:38 GMT+0800 (China Standard Time)

And can you see from any of the tools, memray perhaps, which objects are consuming the memory?

(I'd expect a gc.collect() to help here TBH)

Carlton Gibson · Answer 4 · Wed May 01 2024 19:31:47 GMT+0800 (China Standard Time)

@cacosandon Also, can you try with the PubSub layer, and see if the results are different there? Thanks.

Joaquín Ossandon · Answer 5 · Wed May 01 2024 21:53:16 GMT+0800 (China Standard Time)

Sure! I'll try to find time today to prepare a report on memray --leaks for each protocol server and test the PubSub layer. I'll get back to you soon, thanks!

Joaquín Ossandon · Answer 6 · Thu May 02 2024 00:05:50 GMT+0800 (China Standard Time)

So I tried multiple combinations. All HTML reports from memray are here:
reports.zip

But below there are screenshots from them.

First, tried with Redis Channels (not PubSub) to get memory leaks.

With uvicorn

PYTHONMALLOC=malloc memray run --force -o output.bin -m uvicorn core.asgi:application
+
memray flamegraph output.bin --force --leaks

So, the leaks report include memory that was never released back, but I don't know how to interpret it correctly. Seems like AuthMiddleware was leaking but after removing it, the results are almost the same.

redis-channels-uvicorn-leaks.html

Here is the screenshot of the uvicorn + leaks without AuthMiddleware:

redis-channels-uvicorn-without-authmiddleware-leaks.html

Then tried with daphne

PYTHONMALLOC=malloc memray run --force -o output.bin -m daphne core.asgi:application
+
memray flamegraph output.bin --force --leaks

redis-channels-daphne-leaks.html

The interesting part is that hypercorn showed no memory leaks (or maybe memray is not working here(?))

PYTHONMALLOC=malloc memray run --force -o output.bin -m hypercorn core.asgi:application
+
memray flamegraph output.bin --force --leaks

redis-channels-hypercorn-leaks.html

Then, I tried with garbage collect for uvicorn and daphne. Same story for both.

memray run --force -o output.bin -m uvicorn core.asgi:application
+
memray flamegraph output.bin --force

redis-channels-uvicorn-gccollect.html

memray run --force -o output.bin -m daphne core.asgi:application
+
memray flamegraph output.bin --force

redis-channels-daphne-gccollect.html

And finally tried with PubSub for uvicorn and daphne

memray run --force -o output.bin -m uvicorn core.asgi:application
+
memray flamegraph output.bin --force

redis-pubsub-uvicorn.html

memray run --force -o output.bin -m daphne core.asgi:application
+
memray flamegraph output.bin --force

redis-pubsub-daphne.html

Just in case, I also removed all @profile above functions so the memory leaks were not affected by the memory-profiler library.

Hope all these reports help understanding the constant memory increase.

Right now I am trying to move my application to hypercorn so I can test it on staging, but websocket messages are empty 🤔 . If I manage to solve it, I'll post the results here!

Joaquín Ossandon · Answer 7 · Thu May 02 2024 08:42:07 GMT+0800 (China Standard Time)

I've made it to make hypercorn work!

For some reason the websocket messages that were bytes-only were sent as {"text": None, "bytes": ... } just in hypercorn so the function of AsyncWebsocketConsumer always called the text handler.

Added a PR for that: #2097

  async def websocket_receive(self, message):
      """
      Called when a WebSocket frame is received. Decodes it and passes it
      to receive().
      """
-     if "text" in message:
+     if "text" in message and message["text"] is not None:
          await self.receive(text_data=message["text"])
      else:
          await self.receive(bytes_data=message["bytes"])

Testing now in staging 🤞

Joaquín Ossandon · Answer 8 · Thu May 02 2024 13:40:39 GMT+0800 (China Standard Time)

Still there is a memory leak in my application with hypercorn 😕

It seems that memray just doesn't work with it, because memory-profiler does show a constant non-stop increase with any server protocol.

Carlton Gibson · Answer 9 · Thu May 02 2024 14:22:31 GMT+0800 (China Standard Time)

Hi @cacosandon

Looking at the uploaded report, for e.g. redis-pubsub-daphe, the memory usage rises and the stabilises:

The redis-channels-uvicorn-leaks report peaks at 168MB then falls to 151MB.

Joaquín Ossandon · Answer 10 · Thu May 02 2024 22:38:14 GMT+0800 (China Standard Time)

Hey @carltongibson, thank you for taking a look.

Yep, but if you zoom in redis-pubsub-daphe it just decelerates the memory increase (click on the graph). I think the first rise is just the correct memory usage, and then u see the memory leak.

On the other hand, redis-channels-uvicorn-leaks experiences memory drops at intervals due to the PYTHONMALLOC=malloc flag; however, the overall memory usage continues to increase. If you examine each drop, you'll notice that the memory level after each fall is higher than before, without stopping.

Joaquín Ossandon · Answer 11 · Mon May 06 2024 01:46:13 GMT+0800 (China Standard Time)

@carltongibson, do you have any clue about what's happening? Or what else can I try? I'm willing to try anything!

Carlton Gibson · Answer 12 · Mon May 06 2024 13:06:24 GMT+0800 (China Standard Time)

@cacosandon Given that you report it happening with the pub sub layer and different servers, not really. You need to identify where the leak is happening. Then it's possible to say something.

Joaquín Ossandon · Answer 13 · Thu May 09 2024 21:07:34 GMT+0800 (China Standard Time)

@carltongibson all my samples are from using RedisChannelLayer or RedisPubSubLayer, with uvicorn, daphne or hypercorn, with the tutorial example. My app has the problem too but I think it's a generalized problem.

Some things I've noticed:

Memory increases constantly when there are large messages (>0.5MiB)
Memory increases constantly when there are multiple connects/disconnects (every handshake adds memory)
Memory leak is not present using InMemoryChannelLayer
Using explicit del and gc.collect() decelerates the increase of memory.. but the leak is still present
Creating large objects in Django Views does not leak the memory (every request kinds of clean up)

I don't know how nobody else is having this problem. Maybe they just don't send large messages 🤔

Carlton Gibson · Answer 14 · Thu May 09 2024 21:11:41 GMT+0800 (China Standard Time)

Hi @cacosandon — are you able to identify where the leak is happening? Alas, I haven't had time to dig into this further for you. Without that it's difficult to say too much.

If you can identify a more concrete issue, there's a good chance we can resolve it.

Joaquín Ossandon · Answer 15 · Thu May 09 2024 21:14:10 GMT+0800 (China Standard Time)

@carltongibson no :( that's actually the thing that I'm struggling on: finding the memory leak 😓

I really tried every tool to detect it, but nothing noticeable or strange in the reports..

Chris McGraw · Answer 16 · Fri May 10 2024 03:34:18 GMT+0800 (China Standard Time)

I don't know how nobody else is having this problem. Maybe they just don't send large messages

I wouldn't assume that. 😉 I've been silently watching and hoping you find more than I did when I looked. We had some success changing servers from daphne to uvicorn. We're still seeing some leakiness, but have resolved to using tools to monitor memory and restart services.

Here are some other things I've watched:

Joaquín Ossandon · Answer 17 · Fri May 10 2024 05:09:33 GMT+0800 (China Standard Time)

@mitgr81 what tools do you use to monitor and restart? For now I would love to implement that.

Will take a look on those resources!

Chris McGraw · Answer 18 · Sat May 11 2024 02:38:22 GMT+0800 (China Standard Time)

@mitgr81 what tools do you use to monitor and restart? For now I would love to implement that.

We're rocking a bespoke monitor for docker containers. It's pretty simple; essentially we label each container with a valid restart time and a memory limit (among other rules); and the "container keeper" looks for them.