lovasoa / marshmallow_dataclass

Automatic generation of marshmallow schemas from dataclasses.

Home Page:https://lovasoa.github.io/marshmallow_dataclass/html/marshmallow_dataclass.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Union field is broken: The order of Union arguments is indeterminate

dairiki opened this issue · comments

Semantically, the order of the arguments of a union type is insignificant. Union[int, str] and Union[str, int] refer to the same type. When we ask a typing.Union instance for its arguments, it is under no obligation to give them to us in the same order as was used to create the instance.

Our [Union marshmallow field], however, does attach significance to the order of the arguments. When de/serializing a value, it tries to do so using a field appropriate for each of the types in the union, in order. E.g. for a (dataclass) field with type Union[int str], we try to deserialize first as an int, falling back to str, only if the input can not be parsed as an integer.

So the whole design of our system for handling union fields is flawed.

Most of the time, it happens to work. Python's typing does have a type cache. When a generic alias (e.g. a Union) is instantiated, a previously created instance may be returned, if one has already been created with the same arguments.

The type cache system (currently) does not ignore argument order, a fact which mostly saves us:

>>> from typing import *
>>> Union[int, str] is Union[int, str]
True
>>> Union[int, str] is Union[str, int]
False
>>> Union[int, str].__args__
(<class 'int'>, <class 'str'>)
>>> Union[str, int].__args__
(<class 'str'>, <class 'int'>)

Equality comparison between Unions, however, does ignore the argument order:

>>> Union[int, str] == Union[str, int]
True

When arguments to a union type are themselves unions, this can start to cause trouble.

>>> Union[Union[int, str]] is Union[Union[str, int]]
True
>>> Union[Union[int, str]]
typing.Union[int, str]
>>> Union[Union[str, int]]
typing.Union[int, str]

Because the argument lists to the outer calls to Union are equal, we get the same cached instance back. One of the two results will have the "wrong" argument order.


Here's a simple script that exercises the problem.

from typing import Optional, Union
from marshmallow_dataclass import dataclass

# Comment out the next line and the assert will pass, otherwise it will fail
Optional[Union[str, int]]

@dataclass
class Test:
    x: Optional[Union[int, str]]

assert Test.Schema().load({"x": "42"}) == Test(x=42)

The failure of the test in question (this time) was caused by the addition of an unrelated function annotation in pytest-mypy-plugins 1.11.0.
(The annotation involves Optional[Union[str, int]]. When our test constructs an Optional[Union[int, str]] we get the cached first instance that has the arguments in the other order.)


Related

This issue surfaced and was discussed in PR #246.