django / channels

Developer-friendly asynchrony for Django

Home Page:https://channels.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Users erratically getting CancelledError with Django ASGI

olha-trokhymchuk opened this issue · comments

Our users are erratically getting CancelledError for any page in our system. The only pattern we’ve observed is that this happens more often for pages which take more time to load during normal operation. But it is absolutely not limited to such pages, it can happen anywhere in our system, e.g. login page. All of the affected pages do not use any async code or channels, they’re standard django views working in request/response model (we migrated to ASGI only recently and we only have a single page which uses channels and it works just fine). We cannot reproduce it consistently.

What we see in sentry.io:

CancelledError: null
  File "channels/http.py", line 198, in __call__
    await self.handle(scope, async_to_sync(send), body_stream)
  File "asgiref/sync.py", line 435, in __call__
    ret = await asyncio.wait_for(future, timeout=None)
  File "asyncio/tasks.py", line 414, in wait_for
    return await fut

Locally and in Daphne logs it look like it:

2022-10-12 20:00:00,000 WARNING Application instance <Task pending coro=<ProtocolTypeRouter.__call__() running at /home/deploy/.virtualenvs/…/lib/python3.7/site-packages/channels/routing.py:71> wait_for=<Future pending cb=[_chain_future.._call_check_cancel() at /usr/lib/python3.7/asyncio/futures.py:348, <Task WakeupMethWrapper object at 0x7f1adcbf9610>()]>> for connection <WebRequest at 0x7f1adcc6bb50 method=POST uri=/dajaxice/operations.views.calculate_cost_view/ clientproto=HTTP/1.0> took too long to shut down and was killed. 2022-10-12 20:00:00,000 WARNING Application timed out while sending response

From the user’s POV, the page simply fails to load and they have to re-click a button or refresh the page.

Libraries what we use:

python = 3.7
Django = 2.2.12
channels = 3.0.5
channel-redis = 3.4.1

On server we use: Nginx, supervisor, Daphne.

For all requests (HTTP and websockets) we use ASGI.

Our command for running daphne: daphne -t 300 project.asgi:application

What we already tried to do:

  1. Adding timeout to Daphne (as you can see above)
  2. Update channels library from 3.0.4. to 3.0.5 (because we found info that asgiref 3.3.1, that used in channels 3.0.4, could be the culprit for this issue: https://lightrun.com/answers/django-channels-warning---server---application-instance-took-too-long-to-shut-down-and-was-killed)

Any idea what this is caused by or how to troubleshoot it?