django / channels_redis

Redis channel layer backend for Django Channels

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

redis-py >=4.4.0 breaking changes

bbrowning918 opened this issue · comments

This issue is to help with visibility and discussion:

redis-py >=4.4.0 introduces two breaking issues for the test suite and real usage on

  1. test_receive_cancel in both core and sentinel will fail with "TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'" on the response to ZREMRANGEBYSCORE. In usage it has been noted in redis-py on redis/redis-py#2593 affecting normal looking send calls.
  2. test_message_expiry__group_send__one_channel_expires_message can hang indefinitely in the test suite

These affect both unrelated changes/PRs and 4.0.0 in actual usage. The change log for redis-py 4.4.0 does not seem to have a smoking gun and pinning it to 4.3.5 has the test suite passing better.

Thanks for writing this up @bbrowning918! 🏅

commented

I believe I tracked down and fixed the first issue you described. I have a PR ready in redis-py repo redis/redis-py#2607
The biggest change from 4.3.5 to 4.4.0 was that they got rid of second buffer SocketBuffer and started using asyncio.StreamReader directly. Which resulted in that odd behaviour when canceling a task executing redis commands

Hi @jajce — I had a quick look at your branch in #353. The GHA workflows didn't complete.

Locally it's hanging in test_core.py:

task: <Task pending name='Task-1037' coro=<test_message_expiry__group_send__one_channel_expires_message() done, defined at /Users/carlton/Projects/Django/channels_redis/tests/test_core.py:554> wait_for=<Future pending cb=[Task.task_wakeup()]>>

I haven't had a chance to dig more deeply yet, so I appreciate this is quite superficial — but maybe it gives you a breadcrumb… Thanks!

commented

@carltongibson yeah, unfortunately my PR in redis-py is supposed to fix only test_receive_cancel :(. I am still looking into what is happening with test_message_expiry__group_send__one_channel_expires_message

commented

Ok, investigated a little bit more and will share what I've found so far. Now I am pretty much sure that both of the issues are related to the removal of SocketBuffer redis/redis-py#2418 . The test_message_expiry__group_send__one_channel_expires_message issue is very well described in this ticket redis/redis-py#2579 .
In our test the issue is that even though we time out our receive here:

with pytest.raises(asyncio.TimeoutError):
async with async_timeout.timeout(1):
await channel_layer.receive(channel_1)

our next receive, which is here
message = await channel_layer.receive(channel_2)

will have to wait for 5 seconds
brpop_timeout = 5

content = await self._brpop_with_clean(
index, channel_key, timeout=self.brpop_timeout
)

to pass till we are able to read response from our next redis command. In our case the first command we are sending to redis is our cleanup EVAL script and when we are getting to read response StreamReader waits for 5 seconds to pass till it actually reads response for EVAL command.

So on one hand it is very different behaviour compared to redis v4.3.5 which is an issue, but on the other hand according to the Redis documentation regarding blocking operations it is an intended behaviour and redis client connection is supposed to be blocked till timeout passes or till another client writes something to the list we are trying to pop from.

I still need to understand how SocketBuffer prevented whole connection from being blocked in 4.3.5, which hopefully will lead to a fix.

OK, I've marked the failing tests xfail for now in #356.

This allows updating redis-py for 4.5.3, which is a security release.

The current Python 3.11 failures are pending the next redis-py release redis/redis-py@4802530 (4.5.4 or greater.) Error in asyncio.timeout before Python 3.11.3.

We have questions here:

  • Are the test errors more issues in redis-py?
  • Are they that we need to rewrite our tests for the new ways things work?
  • Are there actual issues in channels_redis? Can folks provide minimal reproduces (test cases) if so?

I'll leave this issue to track.

As per #358, we're going to skip PY311 tests until the next redis-py release.

@carltongibson after upgrading to 4.1.0 I got troubles when testing channel layers.

I am using django-channels (specifically the async consumers) and I notice that when I run multiple tests that are using the channel_layer I keep getting TimeoutError randomly. Fun fact: if I run each test one at a time no error occurs.

I suspected there was something that is not flushing properly between tests. Indeed, calling app.channel_layer.flush() seem to solve the problem. Here's a toy test:

@pytest.mark.asyncio
@pytest.mark.django_db(transaction=True)
async def test_consumer1():
    app = TestConsumer()
    communicator = WebsocketCommunicator(app, "")
    connected, _ = await communicator.connect()
    assert connected
    await app.channel_layer.flush()

I am trying to reproduce the issue with something simpler but it looks harder than expected. I struggle to understand what is going on under the hood when mixing up sync_to_async, pytest.mark.django_db(transaction=True) and channel layers.