lovasoa / marshmallow_dataclass

Automatic generation of marshmallow schemas from dataclasses.

Home Page:https://lovasoa.github.io/marshmallow_dataclass/html/marshmallow_dataclass.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The type constraint of `errors` is broken

uncle-lv opened this issue · comments

errors should be dict[str, list[str]] in marshmallow.

But when union fields raise errors, marshmallow_dataclass adds the error objects into errors directly. The error message generated by marshmallow_dataclass will be List[TypeCheckError | ValidationError].

It breaks the type constraint of errors.

This bug is found in this issue.

I started to look into fixing this, but then got confused.

If we have a union of simple types:

from dataclasses import dataclass
from typing import Union

from marshmallow_dataclass import class_schema

@dataclass
class IntOrStr:
    value: Union[int, str]

schema = class_schema(IntOrStr)()

Then, we expect schema.load({"value": object()}) to raise something like

ValidationError({"value": ["Not a valid int.", "Not a valid string."]})

(Currently, it raises ValidationError({"value": [ValidationError("Not a valid int."), ValidationError("Not a valid string.")]}), which as you point out is pretty clearly not normal marshmallow usage.)

But what if we have a union of compound types?
As an example:

@dataclass
class IntStrOrStrInt:
    value: Union[Tuple[int, str], Tuple[str, int]]

schema2 = class_schema(IntStrOrStrInt)()

What should schema2.load({"value": (1,1)}) raise?

Currently, it raises

ValidationError({"value": [
    ValidationError({1: ["Not a valid string."]}),  # fails parsing as Tuple[int, str], because the second value is not a str
    ValidationError({0: ["Not a valid string."]}),  # fails parsing as Tuple[str, int] because the first value is not a str
]})

We could merge these in the same manner that marshmallow’s merge_errors does. That would result in

ValidationError({"value": {
    0: ["Not a valid string."],
    1: ["Not a valid string."]
}})

I'm not convinced that that makes sense (but I'm at a loss for better ideas).

As an alternative, I considered treating the Union (for error reporting purposes) essentially like a sequence of the constituent types.
Then the resulting errors dict would then be one level deeper:

ValidationError({"value": {
    0: {1: ["Not a valid string."]},  # deserialization of the second element of the first type (Tuple[int, str]) failed
    1: {0: ["Not a valid string."]},  # deserialization of the first element of the second type (Tuple[str, int]) failed
}})

I'm pretty sure from poking through marshmallow docs that the keys in the messages dict are supposed to correspond to keys/indexes in the input data. So that would rule out this approach.


Then there's the issue that fixing this will break any existing code that depends on the current (admittedly broken) behavior.

I started to look into fixing this, but then got confused.

If we have a union of simple types:

from dataclasses import dataclass
from typing import Union

from marshmallow_dataclass import class_schema

@dataclass
class IntOrStr:
    value: Union[int, str]

schema = class_schema(IntOrStr)()

Then, we expect schema.load({"value": object()}) to raise something like

ValidationError({"value": ["Not a valid int.", "Not a valid string."]})

(Currently, it raises ValidationError({"value": [ValidationError("Not a valid int."), ValidationError("Not a valid string.")]}), which as you point out is pretty clearly not normal marshmallow usage.)

But what if we have a union of compound types? As an example:

@dataclass
class IntStrOrStrInt:
    value: Union[Tuple[int, str], Tuple[str, int]]

schema2 = class_schema(IntStrOrStrInt)()

What should schema2.load({"value": (1,1)}) raise?

Currently, it raises

ValidationError({"value": [
    ValidationError({1: ["Not a valid string."]}),  # fails parsing as Tuple[int, str], because the second value is not a str
    ValidationError({0: ["Not a valid string."]}),  # fails parsing as Tuple[str, int] because the first value is not a str
]})

We could merge these in the same manner that marshmallow’s merge_errors does. That would result in

ValidationError({"value": {
    0: ["Not a valid string."],
    1: ["Not a valid string."]
}})

I'm not convinced that that makes sense (but I'm at a loss for better ideas).

As an alternative, I considered treating the Union (for error reporting purposes) essentially like a sequence of the constituent types. Then the resulting errors dict would then be one level deeper:

ValidationError({"value": {
    0: {1: ["Not a valid string."]},  # deserialization of the second element of the first type (Tuple[int, str]) failed
    1: {0: ["Not a valid string."]},  # deserialization of the first element of the second type (Tuple[str, int]) failed
}})

I'm pretty sure from poking through marshmallow docs that the keys in the messages dict are supposed to correspond to keys/indexes in the input data. So that would rule out this approach.

Then there's the issue that fixing this will break any existing code that depends on the current (admittedly broken) behavior.

I found this bug for the users of apiflask. But I hardly use union type in dataclasses. Maybe we should discuss it with people who use it regularly.