wyfo / apischema

JSON (de)serialization, GraphQL and JSON schema generation using Python typing.

Home Page:https://wyfo.github.io/apischema/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not informative exception

justdominator opened this issue · comments

Hi @wyfo ,

First of all, the project is impressive! We get a boost of 25% compared to well-tuned Pydantic serialization, which was like the following one when all non-BaseModel nested attributes were on the shoulders of orjson's SIMD.:

def default(obj: Any) -> Any:
    if isinstance(obj, BaseModel):
        return dict(obj)
    raise TypeError

orjson.dumps(content, default=default)

I get these results on not-Cythonized version. So the main benchmark will occur shortly after integration, and I strongly believe that I'll get superior results.

But lets' get closer to the issue topic

from typing import Optional, List
import dataclasses
from apischema import serialize


@dataclasses.dataclass
class Item:
    name: str
    price: Optional[float] = None
    owner_ids: Optional[List[int]] = None


value = Item(name="coerce", price="1.0")
serialize(Item, value, check_type=True)

I'm getting the following exception.

Traceback (most recent call last):
  File "C:\Users\sdubr\PycharmProjects\squall\report.py", line 14, in <module>
    serialize(Item, value, check_type=True)
  File "C:\Users\sdubr\PycharmProjects\squall\venv\lib\site-packages\apischema\utils.py", line 424, in wrapper
    return wrapped(*args, **kwargs)
  File "C:\Users\sdubr\PycharmProjects\squall\venv\lib\site-packages\apischema\serialization\__init__.py", line 577, in serialize
    return serialization_method(
  File "C:\Users\sdubr\PycharmProjects\squall\venv\lib\site-packages\apischema\serialization\__init__.py", line 185, in wrapper
    return fallback(obj)
  File "C:\Users\sdubr\PycharmProjects\squall\venv\lib\site-packages\apischema\serialization\__init__.py", line 169, in method
    raise TypeError(f"Expected {tp}, found {obj.__class__}")
TypeError: Expected <class '__main__.Item'>, found <class '__main__.Item'>

The fact that exception has arrived is proper behavior that I expect. But, the exception doesn't provide any useful information.
It will be great to see something similar to exceptions we can get from the deserialization phase.

Actually, this bug has already been fixed in the upcoming v0.17, but I agree the exception is odd and should be fixed in v0.16.
I will include the bug fix in v0.16.2.

However, the error after the bug fix will only be TypeError: Expected <class 'float'>, found <class 'str'>. There is no error location tracking like in deserialization, and no plan to add it for now. In fact, serialization type checking is considered as an optional assertion (that's why it's disabled by default), as the code is supposed to be statically type-checked. In your example, mypy will in fact already raise an error about Item(name="coerce", price="1.0") because of price type mismatch.

Tracking error location has a cost, for runtime performance — i mean it would also impact normal unchecked serialization — but also in code complexity, cost that I don't want to pay for an assertion. Of course, my opinion can change, but for now with static type checking, I don't really see the interest. Actually, the serialization type checking feature was not even planed at first ; I've added it because others do, and because it was easy to add (Union already used type checking), without impacting the rest of serialization.

Let me highlight the use case. That code that I put earlier in the issue is just a super short example of trace and nothing more.
Simple real-world scenario:
We have SQLAlchemy with data classes mapping. Query changes, DB migrations, some wrong queries, etc can cause type changes.
As an API provider, I want to be sure, that response has proper types, for handling such issues, that's exactly why I'm using type checking and validation, not only for input data but also for output in reason of meet SLA.

So we have really broken data and from this point how quickly we can mitigate depends only on informativity about the issue nature. Such an amount of exception information can be enough only for that person who writes the code. If the issue escalated on SRE, support, etc. it's not.

There are two different speeds here:

  • Speed of an application
  • Speed of development and mitigation. This totally depends on UX (Let's say DX(Dev Experience) or SX(Support Experience) )

BTW, I'm pretty sure that no reason for collecting all errors here. Trace and path to the first invalid item are totally enough.

In fact, I did not take in account complex things like SQLAlchemy in my statement about static type checking. It's indeed an argument to consider about adding error location tracing. However, I think indeed that collecting all errors will never be done, as there would be surely too much impact on unchecked code.

I propose you to keep this issue for the bug fix only, and to create another one about the error location tracing. I think it will have to wait v0.17, because the code has been completely changed since with cythonization.