Persistence between across pytest cases difference between 2.21.1 and 2.21.2

Question

Persistence between across pytest cases difference between 2.21.1 and 2.21.2

LasseGravesenSaxo opened this issue 4 months ago · comments

LasseGravesenSaxo commented 4 months ago

Describe the bug
There's been a change in how persistence works between 2.21.1 and 2.21.2.
Specifically, in 2.21.2, fakeredis seems to persist state across pytest cases that uses a pytest fixture to provide the fakeredis client. This is new behavior that wasn't the case in 2.21.1.

To Reproduce
Steps to reproduce the behavior:

Make a virtualenv to isolate the environment. I used pipenv.
Install fakeredis==2.21.1 and pytest==8.1.1 (using pip install fakeredis==2.21.1 and pip install pytest==8.1.1)
Create a file called test_main.py
Add the following contents to test_main.py:

import pytest
import fakeredis

@pytest.fixture()
def fake_redis_client():
    """Create a FakeRedis client."""
    return fakeredis.FakeRedis()

def test_a(fake_redis_client):
    print(fake_redis_client.keys("*"))
    fake_redis_client.set("a", "1")
    print(fake_redis_client.keys("*"))


def test_b(fake_redis_client):
    print(fake_redis_client.keys("*"))
    fake_redis_client.set("b", "1")
    print(fake_redis_client.keys("*"))

Run command: pytest -xsvvv and observe output.
Install fakeredis==2.21.2 (Using command: pip install fakeredis==2.21.2).
Run command again: pytest -xsvvv and observe output.

Expected behavior
A clear and concise description of what you expected to happen.
I expected the following output from pytest using both 2.21.1 and 2.21.2:

...
collected 2 items

test_main.py::test_a []
[b'a']
PASSED
test_main.py::test_b []
[b'b']
PASSED
...

i.e. between test cases test_a and test_b the state is not persisted in the fakeredis instance.

Actual behavior
A clear and concise description of what actually happened.
What actually happens is that only 2.21.1 produces the expected behavior, where 2.21.2 instead produces this output:

...
collected 2 items

test_main.py::test_a []
[b'a']
PASSED
test_main.py::test_b [b'a']
[b'a', b'b']
PASSED
...

It persists the state from test_a into test_b (i.e. a key is still in the fakeredis in test_b, this should not be there as far as I understand).

Desktop (please complete the following information):

OS: Ubuntu 22.04.4 LTS (WSL2)
python version: 3.10.12
redis-py version: redis==5.0.3
full requirements.txt?

async-timeout==4.0.3
exceptiongroup==1.2.0
fakeredis==2.21.2
iniconfig==2.0.0
packaging==24.0
pluggy==1.4.0
pytest==8.1.1
redis==5.0.3
sortedcontainers==2.4.0
tomli==2.0.1

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Howard Smith · Answer 1 · Mon Mar 11 2024 23:52:01 GMT+0800 (China Standard Time)

I'm also experiencing this. From a quick look it looks like it's been introduced in changes that were made to address #297

Howard Smith · Answer 2 · Tue Mar 12 2024 00:02:26 GMT+0800 (China Standard Time)

OK had a quick play with the previous version and the snippet in #297 and we can achieve the same result with the previous version by just sharing a FakeServer instance.

import asyncio

import fakeredis

server = fakeredis.FakeServer()

async def amain():
    client_1 = fakeredis.FakeAsyncRedis(server=server)
    await client_1.set("async_key", "async_value")

    client_2 = fakeredis.FakeAsyncRedis(server=server)
    print(f"async_client/async_key {await client_2.get('async_key')}")


def main():
    client_1 = fakeredis.FakeRedis(server=server)
    client_1.set("sync_key", "sync_value")
    print(f"original client/sync_key {client_1.get('sync_key')}")

    client_2 = fakeredis.FakeRedis(server=server)
    print(f"sync client/async_key {client_2.get('async_key')}")
    print(f"sync client/sync_key {client_2.get('sync_key')}")


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(amain())
    main()

steve-mavens · Answer 3 · Tue Mar 12 2024 00:57:21 GMT+0800 (China Standard Time)

I raised #297, and the issue was to make the sync and async clients consistent. The async clients already shared data, and that was what was intended, but the sync clients generated a random connection string if nothing was supplied, meaning they didn't share data when created with no arguments. You can get the old behaviour back by passing in a random string (like a uuid4) as the host.

I think either behaviour makes sense and people will write tests that rely on it either way. But the intention is that fakeredis should imitate real redis in the sense that if you connect to "the same" server by specifying the same parameters, then those connections should see each others' changes. #293 is someone raising this same issue but for the async client, and it was refused. So the change is good news if your code uses multiple connection objects to talk to the same Redis instance, bad news if your tests were relying on creating a new object as a means of getting a fresh database.

FWIW my test fixtures call flushdb after creating the connection, to handle the async behaviour. And again at the start of each test, for the tests that use hypothesis, but that's a separate issue with fixture scopes and hypothesis cases.

Daniel M · Answer 4 · Tue Mar 12 2024 03:27:46 GMT+0800 (China Standard Time)

FakeRedis aims to closely emulate the behavior of the authentic Redis system.

In this case, when no connection parameters are supplied, Redis will seamlessly connect to the same server with the same state as the default configuration - therefore, FakeRedis should do the same. When one needs a fresh server, they can opt to execute a flushdb operation, as suggested by @steve-mavens, or establish a connection to a different "host" (using host=uuid() for example).

Considering 3 issues were opened on the day the version was released, I am wondering whether this adjustment qualifies as a breaking change. The previous functionality deviated from the authentic Redis behavior. Moreover, the asynchronous FakeRedis behaved differently compared to the synchronous FakeRedis.

I would be happy to hear what people think. If there is a consensus seeing it as a breaking change, I am happy to revert the changes in #297 and create a new major version instead.

FYI, as suggested in #298, I did include a note in the changelog about it.

Atheuz · Answer 5 · Tue Mar 12 2024 05:30:10 GMT+0800 (China Standard Time)

I don't think it's a patch change, I can accept the change but I was treating every fakeredis I created as a new empty instance (like a dict, when you create dicts like a = dict() and b = dict() you don't expect a and b to share state) and this really does seem to break that. I can accept that this way it acts more like a real Redis server, but it does seem like a breaking change to me.

Is this behaviour documented? I was reading the readthedocs and came across this, which I interpreted as meaning "a new server is created for every redis".

I would prefer it if the default behaviour was to create a new server instance that does not share state and if you need to share state then you can create a server instance that is the same.

Otherwise I really appreciate this library, very useful!

Howard Smith · Answer 6 · Tue Mar 12 2024 05:50:44 GMT+0800 (China Standard Time)

Completely understand and appreciate the fact that FakeRedis was deviating from what real Redis does, however IMO I actually think there was good reason for this, given that the typical (only?) use-case for this package is for testing code that uses real Redis, without those tests requiring an actual Redis server. For this use-case, it's far more practical for each FakeRedis instance to have its own state by default.

Take the code snippet in the OP here for example; that's pretty typical usage AFAIA. Completely get that the fixture can be modified to create the same functionality as before, but this is definitely a breaking change in that regard. Taking that aside, this change makes fakeredis less simple to work with. For example, prior to this change, if one required new state for each test, then they might have a test fixture that simply looks like this:

@pytest.fixture()
def redis():
    return fakeredis.FakeRedis()

With this change, this now needs an additional step:

@pytest.fixture()
def redis:
    return fakeredis.FakeRedis(
        server=fakeredis.FakeServer(host=uuid4())
    )

OK, this isn't gunna set the world alight but still, it is additional maintenance effort for developers. I also appreciate that prior to this change, users that want to share state had to explicitly pass the same FakeServer instance to their FakeRedis instances anyway, but again, given the use-case I would imagine it's far more typical to not want to share state, and this is the way that this package has worked for a long time now - and is how the first page of the docs says to use the package.

If the general consensus is to stick with FakeRedis instances using the same connection by default, then I would argue strongly for reverting and creating a new major version (and updating the docs). If it were up to me though, I'd revert the changes and then make the async client work in the same way as the sync client (this is the way the docs suggest this package works, after all). I've submitted #303 to cover this should this end up being the way to go (I hope this doesn't come across as pushy - just trying to contribute and reduce @cunla's workload! 😅)

I hope that all makes sense and sounds reasonable! Also big thanks to @cunla for creating this package in the first place and maintaining it - I only wish there were similar packages for other infra stuff! (fakekafka, anyone? 😆)

steve-mavens · Answer 7 · Tue Mar 12 2024 18:41:25 GMT+0800 (China Standard Time)

prior to this change, users that want to share state had to explicitly pass the same FakeServer instance to their FakeRedis instances anyway

Not necessarily. If you specify the host the same each time then you get shared state, before and after this change. It's only when you specify no host parameter that you got the random (and hence fresh) server. So I think if you know what you want and you don't want to rely on what happens with no args then it's easy either way: pass 'localhost` to share data, or pass a random string to not share data. Agreed that it's worth documenting "how to" do both things.

In neither case do you need to create a FakeServer object, or even read the docs sufficiently to know what FakeServer is. If your real code just passes a host/port into redis.Redis, then your test code can do the same with FakeRedis. If it uses from_url then there's fakeredis.FakeRedis.from_url.

So it's just about what people should be given if they choose not to specify anything at all. cunla wants to emulate Redis, by connecting to the same (but fake) server on localhost:6379. Clearly those who have already written their tests relying on getting a random server want to not change their existing test code!

I suspect that the people who want the sharing, probably are much more likely to already be passing a host string to FakeRedis in their test code. They're patching some function used by their real code, or mocking some factory used by the real code, and the real code never passes no args because it's getting the redis connection details from some kind of parameter/config/whatever of its own. But that's just a guess based on the fact nobody noticed for so long that sync and async were inconsistent.

steve-mavens · Answer 8 · Tue Mar 12 2024 18:52:18 GMT+0800 (China Standard Time)

@cunla as for whether it's a breaking change: I think it depends how much the docs need to change to accurately describe the behaviour. The part Atheuz quoted does seem, to me, to be quite close to guaranteeing the old behaviour. There comes a point where if a behaviour is unintended by the author, but guaranteed by the docs, then changing both the code and the docs at the same time is more than a bugfix :-)

Daniel M · Answer 9 · Tue Mar 12 2024 20:09:45 GMT+0800 (China Standard Time)

ok, since the documentation states "a new instance is automatically created for you", I merged the PR and will publish a fix.

Note that when supplying the same connection params, the server will be shared still - all that changed is the default behavior.

steve-mavens · Answer 10 · Tue Mar 12 2024 20:57:14 GMT+0800 (China Standard Time)

when you create dicts like a = dict() and b = dict() you don't expect a and b to share state)

But when you create httpx clients like a = httpx.Client() and b = httpx.Client() you do expect them to connect to the same internet (and to the same state of respx mocks) ;-) I think because of the way it's implemented, and because of the use case of testing code that creates multiple connection objects, it's natural for cunla to think of fakredis.FakeRedis as a client for accessing a shared world of fake servers. Whereas people writing tests of code that only uses one Redis object don't care about that, they naturally just want as much test isolation as possible.