tobgu / pyrsistent

Persistent/Immutable/Functional data structures for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PMap.__getitem__ gets the wrong value

brunonicko opened this issue · comments

PMap.__getitem__ will return the wrong value if the object used for the key defines a unique __hash__ and a custom __eq__ method.

Repro (tested with python 3.8 and pyrsistent 0.17.3):

from pyrsistent import pmap

class CustomObject(object):
    __slots__ = ()

    def __hash__(self):
        return object.__hash__(self)

    def __eq__(self, other):  # Having __eq__ declared causes the failure
        return True  # Changing this to False will fail with a different error

keys = []
py_dict = {}
for i in range(10000):
    key = CustomObject()
    keys.append(key)
    py_dict[key] = i

p_map = pmap(py_dict)
assert len(py_dict) == len(p_map)
assert set(py_dict.keys()) == set(p_map.keys())

for i, key in enumerate(keys):
    assert py_dict[key] == i
    assert p_map[key] == i  # Fails here when `i` is a big number, __getitem__ gets the wrong value

Sorry, I didn't introduce myself. My name is Bruno and I work with python development for visual effects pipelines.
I really like pyrsistent and I think it's an awesome tool for developing modern Python applications. I came across this bug recently, and I am trying to figure out why it's behaving that way. I also noticed this is not a problem when running Python 2.7 and pyrsistent 0.16.1, so maybe it's because of some behaviour introduced in newer python versions that I'm not aware of. I'll keep investigating to see if I can find more information about what's going on.
Cheers.

Thinking more about it, I guess my example breaks the rules of how hashing and equality should work in good python code in the first place. So even though pmap seems to behave differently than a regular dict in this case, maybe it doesn't matter since this particular example is bad and should not be implemented that way anyway.
Feel free to just reject the request, and sorry for the confusion :)

Hi Bruno!

OK, no worries. Having two object that compare equal but result in different hashes seems odd to me. I'm a bit surprised about the difference between Python 2 and 3 though. I need to do some investigation. Hope to find some time in the next days.

This difference comes from an implementation detail in the python dict c-code. Infact, the code in the example for the ordinary dict only passes because the keys are enumerated in the order they were added to the dict. If you include the lines
import random; random.shuffle(keys)
before the for loop, the assert will (usually) fail.

As this behaviour is not in the spec for dict, it may change in the future.

just a suggestion, but eq returning True with hashcode returning nonequal values could throw an exception at runtime? to me that would feel the right thing.