lovasoa / marshmallow_dataclass

Automatic generation of marshmallow schemas from dataclasses.

Home Page:https://lovasoa.github.io/marshmallow_dataclass/html/marshmallow_dataclass.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Type stub-only objects not allowed in dataclass fields

engnatha opened this issue · comments

When using python 3.9 and above, users are not able to use type stub-only fields. The docs indicate one may use marshmallow_field in the metadata to control the marshmallow type that a field should be in the marshmallow schema. However,

type_hints = get_type_hints(
clazz, localns=clazz_frame.f_locals if clazz_frame else None
indicates that typing.get_type_hints is called prior to inspecting for a custom marshmallow_field definition.

Below is an example that minimally reproduces this behavior with version 8.6.0.

$ ipython
Python 3.9.16 (main, Oct 18 2023, 12:04:09) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.34.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import marshmallow_dataclass

In [2]: import dataclasses

In [3]: import marshmallow

In [4]: @dataclasses.dataclass
   ...: class Demo:
   ...:     a: 'NotAType' = dataclasses.field(metadata={'marshmallow_field': marshmallow.fields.Integer(load_default=0)})
   ...: 

In [5]: schema = marshmallow_dataclass.class_schema(Demo)()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-eaed5a25227f> in <cell line: 1>()
----> 1 schema = marshmallow_dataclass.class_schema(Demo)()

~/devel/monorepo/dist/export/python/virtualenvs/python-default/3.9.16/lib/python3.9/site-packages/marshmallow_dataclass/__init__.py in class_schema(clazz, base_schema, clazz_frame)
    384     _RECURSION_GUARD.seen_classes = {}
    385     try:
--> 386         return _internal_class_schema(clazz, base_schema, clazz_frame)
    387     finally:
    388         _RECURSION_GUARD.seen_classes.clear()

~/devel/monorepo/dist/export/python/virtualenvs/python-default/3.9.16/lib/python3.9/site-packages/marshmallow_dataclass/__init__.py in _internal_class_schema(clazz, base_schema, clazz_frame)
    430 
    431     # Update the schema members to contain marshmallow fields instead of dataclass fields
--> 432     type_hints = get_type_hints(
    433         clazz, localns=clazz_frame.f_locals if clazz_frame else None
    434     )

~/.cache/pants/named_caches/pyenv/44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a/versions/3.9.16/lib/python3.9/typing.py in get_type_hints(obj, globalns, localns, include_extras)
   1457                 if isinstance(value, str):
   1458                     value = ForwardRef(value, is_argument=False, is_class=True)
-> 1459                 value = _eval_type(value, base_globals, localns)
   1460                 hints[name] = value
   1461         return hints if include_extras else {k: _strip_annotations(t) for k, t in hints.items()}

~/.cache/pants/named_caches/pyenv/44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a/versions/3.9.16/lib/python3.9/typing.py in _eval_type(t, globalns, localns, recursive_guard)
    290     """
    291     if isinstance(t, ForwardRef):
--> 292         return t._evaluate(globalns, localns, recursive_guard)
    293     if isinstance(t, (_GenericAlias, GenericAlias)):
    294         ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)

~/.cache/pants/named_caches/pyenv/44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a/versions/3.9.16/lib/python3.9/typing.py in _evaluate(self, globalns, localns, recursive_guard)
    552                 )
    553             type_ = _type_check(
--> 554                 eval(self.__forward_code__, globalns, localns),
    555                 "Forward references must evaluate to types.",
    556                 is_argument=self.__forward_is_argument__,

<string> in <module>

NameError: name 'NotAType' is not defined

Instead of getting this error, I would expect the tooling to gracefully skip over the missing object definition (as would be the case in a stub-only object) in favor of the existing override.

What is going on here is that if the type annotation for a field references an undefined type, an exception will be thrown. While the exception message may be a bit opaque, the exception itself does not seem unreasonable to me in this case. The type annotation does, in fact, reference an undefined type — this is, arguably, a syntax error.

Were the annotation to reference any resolvable type, using metadata["marshmallow_field"] to control specify the schema field would work.

I'm inclined to say the described behavior is expected and not a bug.


This may be a stupid question, but: What is the purpose of using a string literal that references an undefined type in the field's type annotation?

I.e. what benefit results from writing

@dataclass
class Demo
    a: "NotAType" =  field(metadata={"marshmallow_field": ...})

rather than, e.g.

from typing import Any

@dataclass
class Demo
    a: Any = field(metadata={"marshmallow_field": ...})

That's an excellent explanation of what's going on and was the conclusion that I came to. The benefit is that there are certain annotations that only exist in type stubs. The example I wrote was contrived, but my specific use case involved using mypy-protobuf. This generates type annotations for protobuf files that let you do really helpful things like autocomplete in editors.

The case where this came up was using their enum implementation. You get to annotate things like my_var: 'my_pb2.MyEnum.V' which means my_var should only take on values found in MyEnum. At run time, MyEnum values are just integers so type hinting as int isn't very useful. Tools like mypy and pyright (which we use) do know how to interpret these annotations, correlate them to the stub file, and do better static analysis than if we had just said my_var: int.

So, I'm on board with you that we can't find a type at run time. After all, that's exactly how type stubs work. The part that I think would be useful to change is that it raises an error even if the author has supplied the marshmallow information in the field metadata section. It seems that's all marshmallow_dataclass needs to know. I was hoping by supplying that metadata information, we could get around the specific type of the field not being able to be determined since it doesn't really need to be determined at that point.

@engnatha Thank you! I think I get it now. If I understand correctly, a workaround would be something like:

if TYPE_CHECKING:
    from my_stubs import MyType
else:
    MyType = Any  # or some other more specific concrete type

@dataclass
class Demo
    a: MyType = field(metadata={"marshmallow_field": ...})

I think you're right: if there is a custom marshmallow_field it would be possible to skip getting the type hint for that field.

In the general case, however, we may still need type hints for other fields in the dataclass. Is it possible to get those without computing (as we do now) the type hints for all the fields in one go?

I expect that will work out. I'll give it a try. The structure of the code I linked made it seem like it would be possible to do some inspection before to see if there's a need for fetching the type of the field. The order seemed to be

  1. Get fields defined by the dataclass types
  2. Override any fields that have their metadata set

Do you think it's feasible to do

  1. Get all the fields that have custom metadata
  2. Infer fields for the remaining types

If there's no issues with the remainder of the marshmallow code (since dataclasses on their own work happily with type stub annotations), then I could see that change being more extensible for users.

On python 3.8 (since haven't upgraded yet), I had to do the following

from typing_extensions import TypeAlias

if typing.TYPE_CHECKING:
    _MyType: TypeAlias = 'my_pb2.MyEnum.V'
else:
    _MyType: TypeAlias = 'Any'

Without the TypeAlias, pyright complains that we're trying to use a variable as a type annotation.

2. Infer fields for the remaining types

The issue here is that, currently, we get the annotations for all of the fields in the dataclass using typing.get_type_hints in one fell swoop. (And get_type_hints raises the exception in question if any single field of the dataclass has an unresolvable type annotation.)

I'm fairly certain that get_type_hints can not be applied to individual class attributes — only to an entire class, in bulk. (This is, at least in part, because the annotations for class attributes are stored in an extra __attributes__ field of the class, not on the individual attributes.)
If there is some other clean way to resolve the type hints for individual fields (or a subset of fields) of a dataclass — and there may well be — I don't currently know what it is.

We could, I suppose, skip the call to get_type_hints in the special case that all the fields have an explicit marshmallow_field set. But that's a pretty limited use case. Doing so would likely cause more confusion than it solves.

Sorry for the delay on this. I agree with your take on the last point; I don't want the alternative to be custom defining all the types as soon as we have to define one.

I think it's a fair assessment that this is a limitation of the type hinting library. The method above is an acceptable workaround when this pops up, so I'm going to close. Maybe the future of get_type_hints will be more flexible.