jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Typed encoding

fungs opened this issue · comments

Description

This is half a question. When working with type definitions aka schemas and type objects, we usually keep them logically separate, in particular if the serialization protocol is not schema-less (aka self-contained). The latter is explicitly needed for deserialization, but for serialization, it is often taken from the instance object itself.

Now, in Python in particular, there are many ways to tamper with runtime objects due to the dynamic nature. Creating frozen objects is not really easy and I don't know the msgspec implementation. I'd feel safer, if the serialization (for instance msgspec.json.encode() or msgspec.to_builtins()) would take the schema as a second argument and use that type definition to traverse the object instead of the one coupled with the object. In a sender receiver scenario in which both parties share the schema beforehand as a data contract, I need to be sure that the encoding does enforce the same structure for a stream of objects.

In essence, this boils down to the question "do these two classes are always doing the same?":

class SafeSender:  # using external schema
  def __init__(schema=s):
    self._schema = s

  def send(msg):
    sendoff(msgspec.encode(msg, self._schema))


class Sender:  # using internal schema
  def __init__(schema=s):
    self._schema = s

  def send(msg):
    assert type(msg) is self._schema
    sendoff(msgspec.encode(msg))

Anyway, the change would be minor, and the second argument could be optional to mimic the current behavior.

A little more thinking is required for derived classes.