jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot set `gc=False` on Generic structs?

HansBrende opened this issue · comments

Question

If I do

class MsgspecEntity(msgspec.Struct, Generic[P], gc=False):

I get the following error:

ValueError: Cannot set gc=False when inheriting from non-struct types

I suppose I can work around this by dynamically redefining the struct for each possible P type, to avoid using Generic, but is this expected? It would be easier if Generic were excluded from the above restriction.

Thanks for opening this! We were originally being a bit stricter than necessary here. The real limitation is types with gc=False must be __slots__ classes, so any mixin type (like Generic) must also define __slots__ = (). With #635 you should be able to set gc=False on generic structs as well.

In [1]: from typing import Generic, TypeVar

In [2]: from msgspec import Struct

In [3]: P = TypeVar("P")

In [4]: class Demo(Struct, Generic[P], gc=False):
   ...:     x: P
   ...:     y: P
   ...: 

In [5]: d = Demo(1, 1)

In [6]: import gc

In [7]: gc.is_tracked(d)
Out[7]: False

Standard note - messing with the gc kwarg is considered "advanced usage", I trust you've read all the warnings in the docs before using it :).

@jcrist thanks for the fix!

I have read the documentation on that, however, I'm confused on one point:

Why would any struct that participates in deserialization not be a good candidate for gc=False? As we know, when deserializing JSON to a normal dict, it is impossible that that JSON is self-referencing. I.e., you can't have a thing inside itself simply because that is impossible to represent as JSON! So any reference cycles for any of these objects participating in deserialization would by nature have to be created manually in the post_init or subsequent stages. So as long as I am not "adding a thing to itself" post-init, and these structs originate from JSON, I should be totally safe for gc=False.

Or am I missing something?

No, that's accurate. Custom types supported by dec_hook could result in cyclic behavior, but in general it's unlikely for the result of a decode call to have any cycles. But code constructing these objects outside of decode could still result in a cycle. The warnings are mostly to let users know "here be dragons" and to deter them from mucking with the gc unless a benchmark shows it matters. That you can properly reason about cyclic object structures and python's GC implementation means you are probably capable of judging whether disabling it on these has consequences for your code :).

@jcrist awesome! One thing I did notice during my benchmarks is that gc=False is somewhat undermined by the presence of UUID fields... for some reason python thinks it should track UUIDs even though they are immutable and only contain a couple underlying primitives. I tried to find something on how to "untrack" UUIDs... but was unsuccessful... so I ended up just disabling garbage collection altogether until all my objects are destroyed anyways by refcounting.

python thinks it should track UUIDs even though they are immutable and only contain a couple underlying primitives

In CPython any type implemented in pure python is a GC type. Since uuid.UUID objects aren't extension types (i.e. they're implemented in python) then they're automatically GC types. If uuid.UUID types were implemented as extension types then you're correct, they wouldn't need to be a GC type.

One option if you'd rather disable GC on the type instead of globally - If you don't ever manipulate the UUIDs as uuids you might try annotating those fields as str instead (possibly with a pattern regex for matching uuids if you're concerned about invalid uuids getting in). Strings are immutable non-GC types. That said, for large payloads the overhead of turning on/off the gc per decode call should be minimal.