wyfo / apischema

JSON (de)serialization, GraphQL and JSON schema generation using Python typing.

Home Page:https://wyfo.github.io/apischema/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adjust ConversionUnionMethod Algorithm to Try to Find Exact Matches for Source Type First

whatamithinking opened this issue · comments

Fantastic library! Also, I kinda love your conversion system. It's fallback mechanism when the input data is not the same type as the source of the converter function is really beautiful.

I have just started using your library in a larger project and ran into a problem with recursion and unintuitive (for me) behavior. This library has extensive documentation, so it is possible I missed something though.

I need to support deserializing a bunch of scalar JSON types to other scalar JSON types which do not seem supported by apischema out of the box, so I registered some conversions and tested it out.

import apischema
from dataclasses import dataclass

apischema.conversions.reset_deserializers(float)
apischema.deserializer(
    apischema.conversions.Conversion(apischema.identity, float, float)
)
apischema.deserializer(apischema.conversions.Conversion(float, str, float))
apischema.deserializer(apischema.conversions.Conversion(float, bool, float))
apischema.deserializer(apischema.conversions.Conversion(float, int, float))

apischema.conversions.reset_deserializers(int)
apischema.deserializer(apischema.conversions.Conversion(apischema.identity, int, int))
apischema.deserializer(apischema.conversions.Conversion(int, str, int))
apischema.deserializer(apischema.conversions.Conversion(int, bool, int))

apischema.conversions.reset_deserializers(str)
apischema.deserializer(apischema.conversions.Conversion(apischema.identity, str, str))
apischema.deserializer(apischema.conversions.Conversion(str, int, str))
apischema.deserializer(apischema.conversions.Conversion(str, float, str))

@dataclass
class Test:
    my_field: str

print(apischema.deserialize(Test, {"my_field": 1.2}))

Output: RecursionError: maximum recursion depth exceeded

So what seems to happen currently is that this will go through the conversions with str as target and hit (int -> str) and then jump to the conversions for target type of int and hit (str -> int) and then we keep looping.

The solution I am currently using is to monkeypatch the ConversionUnionMethod.deserialize method so it tries to find Conversion alternatives which are exact matches for the type of the given data before falling back to your current algorithm. So something like the following...

_old_conversion_union_method_deserialize = ConversionUnionMethod.deserialize
_method_types = {
    StrMethod: str,
    ConstrainedStrMethod: str,
    BoolMethod: bool,
    IntMethod: int,
    ConstrainedIntMethod: int,
    FloatMethod: float,
    ConstrainedFloatMethod: float,
    NoneMethod: None,
}

def _conversion_union_method_deserialize(self, data: Any) -> Any:
    # try to find exact match for a conversion source type matching the given data type
    # if found go with that instead of trying one at a time and trying to convert
    # the data to whatever source type there is for each conversion
    # we fallback to the builtin method which just tries one at a time if no exact match
    error = None
    for alternative in self.alternatives:
        method_class = alternative.method.__class__
        if alternative.method.__class__ is ConversionUnionMethod:
            method_class = alternative.method.alternatives[0].method.__class__
        try:
            type_ = _method_types[method_class]
        except KeyError:
            continue
        if not isinstance(data, type_):
            continue
        try:
            value = alternative.method.deserialize(data)
        except ValidationError as err:
            error = merge_errors(error, err)
            continue
        try:
            return alternative.converter(value)
        except ValidationError as err:
            error = merge_errors(error, err)
        except ValueError as err:
            if not alternative.value_error:
                raise
            error = merge_errors(error, ValidationError(str(err)))
    return _old_conversion_union_method_deserialize(self, data)

ConversionUnionMethod.deserialize = _conversion_union_method_deserialize

This is not exactly elegant, but it seems to work. The source types are not available in the ConversionAlternative objects here, so a hack is to use the method and map that to a type.

I don't have the time to create a pull request but i thought this might help someone and might be something you would be interested in adding. Alternatively, if you have a better approach to solve the same problem, I would be interested.

Thank you for this report. This seems indeed to be a good optimization to add. In fact, this optimization is already implemented for unions, but not for those resulting of multiple deserialization.

However, it may not solve your issue, because you may have fixed the deserialization, but I'm not sure you would be able to generate a JSON schema without encountering this recursion error. Fortunately, the feature you may be looking for is called coercion.

Why is there two overlapping features? Because coercion doesn't impact JSON schema, so you can keep a clean schema, displaying its intent without the noise, while having a more lenient API. It's slightly less optimized, but nobody should care. And its simpler than conversion for simple use case, e.g. stringified values.

Coercion should definitely be mentioned at the beginning of conversion documentation.