rnag / dataclass-wizard

A simple, yet elegant, set of wizarding tools for interacting with Python dataclasses.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: how to deserialise a string reference to an object

adamcunnington-mlg opened this issue · comments

A usage question if I may that I didn't see covered in the docs.

Imagine I have this pseudo code:

@dataclass
class Doc(YAMLWizard):
    foos: list[foo]
    bars: list[bar]

@dataclass
class Foo:
   name: str

@dataclass
class Bar:
   name: str
   foos: list[Foo]

So, imagine I am deserialising some YAML and I want my Bar objects to contain a list of Foo objects but just based off the name. E.g.:

foos:
- name: x
- name: y
- name: z

bars:
- name: my_bar
  foos:
  - x
  - y

So in this case, when the Bar objects are deserialised, I want their foos property to be populated with a list of Foo objects where the Foo object is located by name.

My sense is I need to use type hooks to do this but I've only seen examples where typehooks are done on the class that the YAMLWizard is being applied to (e.g. Doc) whereas in my case, it would need to be on the Bar dataclass. Can I still subclass from the DumpMixin for a downstream object?

Hi @adamcunnington-mlg ! This is indeed an interesting use case. I unfortunately don't know that it's possible to do this with dataclasses currently, at least as this library treats it as it always calls fromdict when de-serializing to a data class, and the implicit expectation there is that the JSON (or YAML) data is an object or dict type.

If different data source were not an issue, that would be natively supported by dataclasses module alone, so for example Foo(name='x'), Foo(**{'name': 'x'}), Foo('x'), and Foo(*('x', )) would all work fine and achieve the same thing.

As said, it's actually the implementation in the library and how a dict data source is de-serialized to a data class instance, that might need to be tweaked for this use case. In fact, I don't know if type hooks alone would allow you to achieve this. This is because from_dict would probably error because the input is an unexpected type anyway -- str instead of dict. To fix it, a feature request might be needed if this use case is still desirable with data classes.

That said, I do believe I was able to find a sort of workaround. It simply involves defining Foo as a NamedTuple type instead of as a dataclass.

I added an example below to illustrate - please let me know if this is acceptable for your use case.

from __future__ import annotations
from dataclasses import dataclass
from typing import TypedDict, NamedTuple

from dataclass_wizard import YAMLWizard


@dataclass
class Doc(YAMLWizard):
    foos: list[Foo]
    bars: list[Bar]


class Foo(NamedTuple):
   name: str


@dataclass
class Bar:
   name: str
   foos: list[Foo]


yaml_data = """
foos:
- name: x
- name: y
- name: z

bars:
- name: my_bar
  foos:
  - x
  - y
"""

c = Doc.from_yaml(yaml_data)

print(c)
# prints:
#   Doc(foos=[Foo(name='x'), Foo(name='y'), Foo(name='z')], bars=[Bar(name='my_bar', foos=[Foo(name='x'), Foo(name='y')])])

@rnag nice workaround, thank you, much appreciated.