Type stub-only objects not allowed in dataclass fields
engnatha opened this issue · comments
When using python 3.9 and above, users are not able to use type stub-only fields. The docs indicate one may use marshmallow_field
in the metadata to control the marshmallow type that a field should be in the marshmallow schema. However,
marshmallow_dataclass/marshmallow_dataclass/__init__.py
Lines 433 to 434 in d6396c1
typing.get_type_hints
is called prior to inspecting for a custom marshmallow_field
definition.
Below is an example that minimally reproduces this behavior with version 8.6.0.
$ ipython
Python 3.9.16 (main, Oct 18 2023, 12:04:09)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.34.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import marshmallow_dataclass
In [2]: import dataclasses
In [3]: import marshmallow
In [4]: @dataclasses.dataclass
...: class Demo:
...: a: 'NotAType' = dataclasses.field(metadata={'marshmallow_field': marshmallow.fields.Integer(load_default=0)})
...:
In [5]: schema = marshmallow_dataclass.class_schema(Demo)()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-eaed5a25227f> in <cell line: 1>()
----> 1 schema = marshmallow_dataclass.class_schema(Demo)()
~/devel/monorepo/dist/export/python/virtualenvs/python-default/3.9.16/lib/python3.9/site-packages/marshmallow_dataclass/__init__.py in class_schema(clazz, base_schema, clazz_frame)
384 _RECURSION_GUARD.seen_classes = {}
385 try:
--> 386 return _internal_class_schema(clazz, base_schema, clazz_frame)
387 finally:
388 _RECURSION_GUARD.seen_classes.clear()
~/devel/monorepo/dist/export/python/virtualenvs/python-default/3.9.16/lib/python3.9/site-packages/marshmallow_dataclass/__init__.py in _internal_class_schema(clazz, base_schema, clazz_frame)
430
431 # Update the schema members to contain marshmallow fields instead of dataclass fields
--> 432 type_hints = get_type_hints(
433 clazz, localns=clazz_frame.f_locals if clazz_frame else None
434 )
~/.cache/pants/named_caches/pyenv/44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a/versions/3.9.16/lib/python3.9/typing.py in get_type_hints(obj, globalns, localns, include_extras)
1457 if isinstance(value, str):
1458 value = ForwardRef(value, is_argument=False, is_class=True)
-> 1459 value = _eval_type(value, base_globals, localns)
1460 hints[name] = value
1461 return hints if include_extras else {k: _strip_annotations(t) for k, t in hints.items()}
~/.cache/pants/named_caches/pyenv/44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a/versions/3.9.16/lib/python3.9/typing.py in _eval_type(t, globalns, localns, recursive_guard)
290 """
291 if isinstance(t, ForwardRef):
--> 292 return t._evaluate(globalns, localns, recursive_guard)
293 if isinstance(t, (_GenericAlias, GenericAlias)):
294 ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
~/.cache/pants/named_caches/pyenv/44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a/versions/3.9.16/lib/python3.9/typing.py in _evaluate(self, globalns, localns, recursive_guard)
552 )
553 type_ = _type_check(
--> 554 eval(self.__forward_code__, globalns, localns),
555 "Forward references must evaluate to types.",
556 is_argument=self.__forward_is_argument__,
<string> in <module>
NameError: name 'NotAType' is not defined
Instead of getting this error, I would expect the tooling to gracefully skip over the missing object definition (as would be the case in a stub-only object) in favor of the existing override.
What is going on here is that if the type annotation for a field references an undefined type, an exception will be thrown. While the exception message may be a bit opaque, the exception itself does not seem unreasonable to me in this case. The type annotation does, in fact, reference an undefined type — this is, arguably, a syntax error.
Were the annotation to reference any resolvable type, using metadata["marshmallow_field"]
to control specify the schema field would work.
I'm inclined to say the described behavior is expected and not a bug.
This may be a stupid question, but: What is the purpose of using a string literal that references an undefined type in the field's type annotation?
I.e. what benefit results from writing
@dataclass
class Demo
a: "NotAType" = field(metadata={"marshmallow_field": ...})
rather than, e.g.
from typing import Any
@dataclass
class Demo
a: Any = field(metadata={"marshmallow_field": ...})
That's an excellent explanation of what's going on and was the conclusion that I came to. The benefit is that there are certain annotations that only exist in type stubs. The example I wrote was contrived, but my specific use case involved using mypy-protobuf. This generates type annotations for protobuf files that let you do really helpful things like autocomplete in editors.
The case where this came up was using their enum implementation. You get to annotate things like my_var: 'my_pb2.MyEnum.V'
which means my_var
should only take on values found in MyEnum
. At run time, MyEnum
values are just integers so type hinting as int
isn't very useful. Tools like mypy and pyright (which we use) do know how to interpret these annotations, correlate them to the stub file, and do better static analysis than if we had just said my_var: int
.
So, I'm on board with you that we can't find a type at run time. After all, that's exactly how type stubs work. The part that I think would be useful to change is that it raises an error even if the author has supplied the marshmallow information in the field
metadata section. It seems that's all marshmallow_dataclass needs to know. I was hoping by supplying that metadata information, we could get around the specific type of the field not being able to be determined since it doesn't really need to be determined at that point.
@engnatha Thank you! I think I get it now. If I understand correctly, a workaround would be something like:
if TYPE_CHECKING:
from my_stubs import MyType
else:
MyType = Any # or some other more specific concrete type
@dataclass
class Demo
a: MyType = field(metadata={"marshmallow_field": ...})
I think you're right: if there is a custom marshmallow_field
it would be possible to skip getting the type hint for that field.
In the general case, however, we may still need type hints for other fields in the dataclass. Is it possible to get those without computing (as we do now) the type hints for all the fields in one go?
I expect that will work out. I'll give it a try. The structure of the code I linked made it seem like it would be possible to do some inspection before to see if there's a need for fetching the type of the field. The order seemed to be
- Get fields defined by the dataclass types
- Override any fields that have their metadata set
Do you think it's feasible to do
- Get all the fields that have custom metadata
- Infer fields for the remaining types
If there's no issues with the remainder of the marshmallow code (since dataclasses on their own work happily with type stub annotations), then I could see that change being more extensible for users.
On python 3.8 (since haven't upgraded yet), I had to do the following
from typing_extensions import TypeAlias
if typing.TYPE_CHECKING:
_MyType: TypeAlias = 'my_pb2.MyEnum.V'
else:
_MyType: TypeAlias = 'Any'
Without the TypeAlias
, pyright complains that we're trying to use a variable as a type annotation.
2. Infer fields for the remaining types
The issue here is that, currently, we get the annotations for all of the fields in the dataclass using typing.get_type_hints in one fell swoop. (And get_type_hints
raises the exception in question if any single field of the dataclass has an unresolvable type annotation.)
I'm fairly certain that get_type_hints
can not be applied to individual class attributes — only to an entire class, in bulk. (This is, at least in part, because the annotations for class attributes are stored in an extra __attributes__
field of the class, not on the individual attributes.)
If there is some other clean way to resolve the type hints for individual fields (or a subset of fields) of a dataclass — and there may well be — I don't currently know what it is.
We could, I suppose, skip the call to get_type_hints
in the special case that all the fields have an explicit marshmallow_field
set. But that's a pretty limited use case. Doing so would likely cause more confusion than it solves.
Sorry for the delay on this. I agree with your take on the last point; I don't want the alternative to be custom defining all the types as soon as we have to define one.
I think it's a fair assessment that this is a limitation of the type hinting library. The method above is an acceptable workaround when this pops up, so I'm going to close. Maybe the future of get_type_hints
will be more flexible.