wyfo / apischema

JSON (de)serialization, GraphQL and JSON schema generation using Python typing.

Home Page:https://wyfo.github.io/apischema/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regression deserializing a subclassed source

pchanial opened this issue · comments

The following code works correctly when cythonization is disabled:

from apischema import deserialize

@dataclass
class Foo:
    bar: str

class MyDict(dict):
    pass

data = MyDict({'bar': 'baz'})
expected = Foo(bar='baz')
actual = deserialize(Foo, data)

Cythonization triggers an LSP violation: the last line fails with the error:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apischema/utils.py:400: in wrapper
    return wrapped(*args, **kwargs)
apischema/deserialization/__init__.py:896: in deserialize
    return deserialization_method(
apischema/deserialization/methods.pyx:463: in apischema.deserialization.methods.SimpleObjectMethod.deserialize
    cpdef deserialize(self, object data):
apischema/deserialization/methods.pyx:464: in apischema.deserialization.methods.SimpleObjectMethod.deserialize
    return SimpleObjectMethod_deserialize(self, data)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   data2: dict = data
E   TypeError: Expected dict, got MyDict

apischema/deserialization/methods.pyx:912: TypeError

I'd indeed use Cython cast to builtin type in order to improve performance, as builtin operations can be optimized. For example, in operator used for object deserialization can use PyDict_Contains instead of PySequence_Contains (this one calling PyDict_Contains using a function pointer).

However, following your issue, I've made a quick benchmark, and the result was surprising: adding the cast (with an additional variable) is in fact slower for small lists (but it's better as expected for bigger lists). That's why I've finally removed this cast.

@pchanial By the way, could you give more details about your use case with buitin subtypes?

Thanks for the quick fix. We were just emulating PEP 584 in some unit tests on a Python 3.8 environment, but there could be less trivial other use cases, such as relying on defaultdict, Counter or OrderedDict.
Btw, it would be great to make public some benchmarks internal to apischema so that contributors could check performance regressions.

I don't think defaultdict would be a good idea because deserialization uses in operator, but PEP 584 emulation is indeed an understandable use case.

Btw, it would be great to make public some benchmarks internal to apischema so that contributors could check performance regressions.

Benchmark is not a trivial things, and apischema's one is just here to give a rough estimation of the relative performance in comparison to alternatives. But it's really poor and not reliable if you want some precise results, like tracking performance regressions.
I don't have any other benchmark available for the moment, but I don't think it matter a lot. If a performance regression is introduced, I think it will be visible directly in the code, as apischema performance mostly comes from algorithmic optimization.

Out of curiosity, why does the deserializer use the in operator?