jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Converting dicts into list with key-reuse

hynek opened this issue · comments

Question

As briefly discussed on Mastodon, here’s a more verbose version:

I’ve got JSON that looks vaguely like this:

{
  "server": "mb10",
  "mailboxes": {
    "user1@example.com": {
      "size": 22304
    }
  }
}

and I would like to get something like this:

@define
class Mailbox:
    name: str # <- this is where the key goes
    size: int

@define
class Server:
    server: str
    mailboxes: list[Mailbox]

can this be achieved with msgspec?

There's no native way to spell this cleanly right now (I'm not even sure what am API for that might look like), but this pattern can be supported in a few different ways through extensions. Do you also need to reserialize the data in the same format? Or is this for decoding only?

Hope it's alright to chime in here, I saw the discussion on Mastodon. It sort of reminds me of relationship attributes in SQLModel/SQLAlchemy, and somehow the type annotation would need to carry a coupling between the encoded types and the more ergonomic decoded struct.

A pseudo-Python API example (black magic omitted) might involve explicitly defining the encoded representation, then the more ergonomic, decoded struct "reshapes" that data on decode. And it would shape it back again on encode, I guess? Key uniqueness wouldn't be preserved in the decoded structs, and probably only re-validated at the edge? In the following example, underscore-suffixed classes represent encoded structures, and the non-suffixed classes represent decoded structures. Also, I've omitted @define or struct subclassing in this pseudo-API.

# Need black magic here...
MailboxSize = int
MailboxName = str
ServerName = str

class Mailbox_:
    size: MailboxSize

class Server_:
    server: ServerName
    mailboxes:  dict[MailboxName, Mailbox_]

class Mailbox:
    name: MailboxName
    size: MailboxSize

class Server:
    name: ServerName  # renamed this attribute from the OP example
    mailboxes: list[Mailbox]

This is the most explicit API representation I could imagine, and does entail duplication, but such an API would give lots of expressivity in "transformations" of structs, not just in flattening mappings into lists of instances with keys as instance attributes. Though a more focused implementation might hack Annotated types or use a magic Field type that does linking to encoded representations, with less duplication but less expressivity, maybe?

There's no native way to spell this cleanly right now (I'm not even sure what am API for that might look like), but this pattern can be supported in a few different ways through extensions. Do you also need to reserialize the data in the same format? Or is this for decoding only?

I do not, but you also don't have to add any functionality just for me. :) I can achieve that using an extra transformation step as outlined by Blake – I was just surprised that I couldn't find a straight-forward solution to an ostensibly common problem in any serialization package so I thought I'd ask around if I'm missing something. :)

Thanks for humoring me!