jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Callbacks to `Encoder`/`Decoder` are not respected in `datetime` objects

TheMythologist opened this issue · comments

Description

Description

Both dec_hook and enc_hook arguments are not respected in all encoders and decoders (tested on JSON and YAML) when datetime objects are used. Note that the print functions in both hooks are not run, and the variable buf contains an ISO 8601 duration string instead of a number (as seen from enc_hook).

Attached is a sample script to show that custom decoding of datetime.timedelta objects is not supported. It also doesn't work for datetime.datetime objects.

import msgspec
from typing import Any, Type
from datetime import timedelta


def enc_hook(obj: Any) -> Any:
    print("Encoding")
    if isinstance(obj, timedelta):
        # convert the timedelta to a number
        return obj.total_seconds()
    else:
        # Raise a NotImplementedError for other types
        raise NotImplementedError(f"Objects of type {type(obj)} are not supported")


def dec_hook(type: Type, obj: Any) -> Any:
    print("Decoding", type)
    # `type` here is the value of the custom type annotation being decoded.
    if type is timedelta:
        # Convert ``obj`` (which should be a ``number``) to a timedelta
        return timedelta(seconds=obj)
    else:
        # Raise a NotImplementedError for other types
        raise NotImplementedError(f"Objects of type {type} are not supported")


class MyMessage(msgspec.Struct):
    field_1: str
    field_2: timedelta


enc = msgspec.json.Encoder(enc_hook=enc_hook)
dec = msgspec.json.Decoder(MyMessage, dec_hook=dec_hook)

msg = MyMessage("some string", timedelta(seconds=5))

# Doesn't work for JSON decoder
buf = enc.encode(msg)
print(buf)
a = dec.decode(buf)
print(a)

# Doesn't work for YAML decoders either
buf = msgspec.yaml.encode(msg, enc_hook=enc_hook)
print(buf)
a = msgspec.yaml.decode(buf, type=MyMessage, dec_hook=dec_hook)
print(a)

Update: This was broken sometime between version 0.16.0 and version 0.17.0.

Update: It was this specific commit that broke the hook for datetime.timedelta objects: 2b72ebb

Update: Seems like hooks for datetime.datetime objects were broken since the start

.encode and .decode methods under the hood call msgspec.to_builtins and msgspec.convert functions respectively.

Both functions have parameter builtin_types, which disables processing of specified builtin types by the msgspec, but it does not pass those types to *_hook methods, only non-builtin types are passed to *_hooks.

Wether this is a bug or by design - only @jcrist can tell (no pun intended :-)
But it definitely feels like a bug.

The above can be illustrated with:

import msgspec as ms
import datetime as dt

def enc_hook(obj: Any) -> Any:
    print("Encoding")
    if isinstance(obj, T):
        return obj.name
    if isinstance(obj, dt.timedelta):
        # convert the timedelta to a number
        return obj.total_seconds()
    else:
        # Raise a NotImplementedError for other types
        raise NotImplementedError(f"Objects of type {type(obj)} are not supported")


class T:

    def __init__(self, name='some name'):
        self.name = name


class MyMessage(ms.Struct):
    field_1: T
    field_2: dt.timedelta


msg = MyMessage(T(), dt.timedelta(seconds=5))

msg_encoded = ms.to_builtins(
        msg,
        builtin_types=(
                dt.timedelta,
        ),
        enc_hook=enc_hook
    )

print(msg_encoded)

The above outputs:

Encoding
{'field_1': 'some name', 'field_2': datetime.timedelta(seconds=5)}

I can see 2 ways to overcome this behaviour until (if ever) it gets changed:

  1. Implement your own encode/decode method where you can control what happens to dict produced by msgspec before it gets sent to en/de-coders.
  2. Wrap builtin type in custom type to be handled by _hooks.