jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Coerce a `None` value to the default

hoopes opened this issue · comments

Question

Hi, I was hoping for a flag or some other method of interpreting a None value in json as the default value, such that it would continue to conform to the type of the field (and avoid mypy continuing to complain about my field being optional)

For example:

class ForceList(msgspec.Struct):
    data: list[int] = msgspec.field(default_factory=list)

x = msgspec.json.Decoder(ForceList).decode('{"data":null}')

results in

msgspec.ValidationError: Expected `array`, got `null` - at `$.data`

Which is true! But it would be cool to detect that null value, and force it to be an empty list. The alternative is to do something like:

class ForceList(msgspec.Struct):
    data: list[int] | None = msgspec.field(default_factory=list)

    def __post_init__(self):
        if self.data is None:
            self.data = []

But then my typing for the data field is an optional list, and mypy tells me that i need to sprinkle assertions everywhere to be sure that it's actually a list before i read it. It would also be great to then not serialize that back to json, but that would be a bonus.

Thanks so much for the library! You would not believe how much time it is saving me for very large json files (upwards of 200MB).

One thing you could do now is to have a "private" field with optional list and a property that isn't optional.

That's actually the direction i went, but the setters and append was all a little bit beyond the level of hackery I was willing to commit. If it was a read-only field, the private optional field and non-optional @property would work great. I was going to try to write some sort of helper class that would act like a list and support append, but eh. A bridge too far. Thanks for taking the time to reply though, I appreciate it!

no prob. I am a big "immutables" fan, so wasn't thinking about the need for mutable fields.

another idea for you then - you can have phantom types, just for deserialization, which will have optional fields and post_init logic. Then you'll just convert from them to the final type to be used in the code. This can be easily automated, either via code generation (explicit approach), or you could even play around with generating the phantom types from the "true" types on the fly

on the topic, I think all the implicit logic, like treating nulls as missing value, is more headache then gain in the long run. pydantic was (or maybe still is) is so confusing with None/Optional handling. As PEP20 says - Explicit is better than implicit.

I don't disagree - big fan of immutable data structures, and the explicit > implicit is true for sure. However - that's the json i got! msgspec is so much faster than pydantic that i gotta ask, at least.