jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add either `init_omit_defaults` or `omit_none`

HansBrende opened this issue · comments

Description

Problem:
I want to be able to omit None fields from the json serialization, to make the serialization more compact.
Currently, I can do this by setting omit_defaults to True, and adding a default of None to each field. However, this has the downside that when I am programmatically constructing an instance of this struct, it will no longer fail fast if I've forgotten one of the arguments. I want to be sure that as new fields are added to this struct, I'm not forgetting to add them in my code in other places.

There's two ways I see that this could be accomplished, off the top of my head:

  1. Add an init_omit_defaults option (similar to repr_omit_defaults) where, if set to True, the default for each field will not be added to the __init__ method (only added during deserialization).

  2. Add an omit_none option (similar to fastapi's response_model_exclude_none option) where the serialization automatically excludes None regardless if it is the default or not.

Both of these would accomplish exactly what I need.

Other approaches I've considered:

I could do random_field: Union[FieldType, UnsetType], with no default, which would accomplish almost exactly what I'm after. This has two very annoying downsides, however:

  1. When I'm programmatically constructing my struct, I have to convert None to UNSET for each field
  2. When I'm using my struct in other places, I have to convert UNSET back to None for each field. This would probably lead to more potential logic errors than I was originally trying to avoid, as it is used all over the place and all the existing code assumes that None is the "not present" value.

However, this has the downside that when I am programmatically constructing an instance of this struct, it will no longer fail fast if I've forgotten one of the arguments. I want to be sure that as new fields are added to this struct, I'm not forgetting to add them in my code in other places.

That's an interesting use case. What about using a custom classmethod as a constructor in the places where you always want to explicitly set each field? Something like:

from __future__ import annotations

import msgspec


class Demo(msgspec.Struct):
    field_one: int | None = None
    field_two: int | None = None
    field_three: int | None = None

    @classmethod
    def new(cls, *, field_one: int, field_two: int, field_three: int) -> Demo:
        return cls(field_one, field_two, field_three)


# elsewhere in your code...
demo = Demo.new(field_one=1, field_two=2, field_three=3)
print(demo)
#> Demo(field_one=1, field_two=2, field_three=3)

# any locations where you forgot to add `field_three` would then error
Demo.new(field_one=1, field_two=2)

When adding a new field to the struct you'd need to remember to also add it to the classmethod, but the close proximity of the two should help you remember. Heck you could even enforce these align with a check at import time via a __init_subclass__ hook if you wanted to (I'm happy to provide an example if this interests you). IMO this is a nice low-tech solution to a code hygiene problem.


Add an omit_none option

This also might make sense, but would obviously take more work on my end.

As a meta conversation, I'm now wondering if options like this or omit_defaults should be set per-call to encode (or on the Encoder once) rather than on the type. The logic being that sometimes you might want to encode the full model and sometimes you might want a more compact representation - but these attributes are more specific to the call site than to the type being represented?

@jcrist actually, I already have the class method you speak of, and by "other places in my code" I was referring to this one class method 😆

The __init_subclass__ hook you mention would be interesting indeed, would love to see that example! Provided it adds minimal overhead (I'm instantiating millions of these quite often) I think that could work. All I need is a basic fail-fast sanity check to make sure I'm populating all fields (previously accomplished very smoothly simply by not having defaults set).

I agree with your meta comment... it seems like that would provide more flexibility. Although, for my own use-cases (currently) I personally do not need multiple flavors of serialization. I could see how it would be annoying though if I at some point in the future needed to serialize two different ways.

One possible downside I could see is that to generate the "schema" correctly you'd need to also supply the encoder you use to serialize... as the schema might also depend on encoder arguments. But on the other hand, maybe that is not a downside at all. I think something similar happened in pydantic recently (as far as needing additional arguments to generate schema properly), because FastAPI now generates potentially two different schemas... one "deserialization" schema, and one "serialization" schema... as what is required vs. not changes depending on whether reading or writing.

@jcrist

As a meta conversation, I'm now wondering if options like this or omit_defaults should be set per-call to encode (or on the Encoder once) rather than on the type

as in #549 (my reply here) - I'd say ideal is for the class-based parameter to define a default, but also provide a way to override it in the encoder. But if this is too much work, encoder seems like the best option as it has no downsides but gives finer control.