wyfo / apischema

JSON (de)serialization, GraphQL and JSON schema generation using Python typing.

Home Page:https://wyfo.github.io/apischema/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmarks against Pydantic v2

Poaz opened this issue · comments

Hello!

How does apischema fair against the new Pydantic version speed wise?

Have any new benchmarks been done?

I've migrated typedload's testsuite to test against pydantic2.

It's not the same tests as on the apischema's website but perhaps it can give you an idea

https://ltworf.github.io/typedload/performance.html

@ltworf Thank you for your link!

Actually, I was quite surprised by the results of apischema in your benchmarks, so I looked at the code and noticed several things:

  • You enable serialization type checking ; as stated in the documentation, type-checking at runtime is redundant with typing annotation (which should be the normal use case of apischema), that's why it's not enabled by default. I don't really see why it's enabled in this benchmark, maybe for feature parity, but it doesn't really matter here as dump performance is still fine.
  • I disagree with your comment apischema will return a pointer to the same list, which is a bug that can lead to data corruption, but makes it very fast so level the field by copying the list. This is not a bug, and this is even a very important feature performance-wise. 99% of the time (at least in my experience), serialized data is immediately used (for JSON/binary encoding/printing/etc.) while deserialized data is temporary (e.g. deserialized JSON payload). There is no interest to copy temporary data, that's why apischema adapts to this main use case and avoids unnecessary copies to offer the best performance (that's the goal of default settings). However, depending on your workflow, you may want to blindly copy all the data, and apischema enables it with a setting. I think it should be a lot faster to use this setting instead of adding d = copy.deepcopy(r) after deserialization. Currently, the benchmark doesn't treat apischema fairly, but I assume you didn't know about this feature, and I will not blame you. Could I kindly ask you to fix you benchmark in your next release ? :)
  • Regarding realistic union of objects as namedtuple benchmark, you use an discriminator/internally tagged representation. I assume that typedload is optimized for this pattern, while it's not the case for apischema. I had the plan to add this pattern recognition to apischema, but I haven't find the time since, and actually, I don't like at all to add Literal field in my dataclass. Instead, apischema has a discriminator feature that doesn't require the field to be present in the types, and automatically add it back at serialization (and I really prefer this additive way of doing, I don't like to modify my types just for serialization purpose). You know about this feature as it's present in your benchmark, but I think you expose the (a lot slower) non-discriminator case in your chart. I don't find this really fair, as benchmarking should not use unoptimized case, but it highlights your good handling of this pattern, so fair enough I suppose?
  • Benchmarking error handling is quite dependant on the library behavior. apischema (and pydantic) does a full validation of the data, collecting all the errors, while typedload seems to fail fast at the first error. That's being said, I don't understand the results of the error handling benchmark, see next sentence.

I have executed some of your benchmarks in my computer, with the no_copy = False fix ofc, and apischema is still substantially faster than typedload or pydantic (all the benchmarks I've executed give -1 for apischema, and numbers like (0.59/0.98/etc.) for the other libraries)

@Poaz Thank you for your interest for apischema. I haven't updated the benchmark yet, but of course I've run my benchmark right after pydantic release. Here is the results : pydantic is still (a lot) slower, except for complex deserialization where it can achieve similar performance. But progress has been made, from x10/x20 for deserialization/serialization to x2/x3 ("x2" means "two times slower than apischema, that's still a lot). I want to quote @ltworf here "years of work to rewrite it in Rust, still managing to lose some benchmarks 😅", pydantic is indeed still not optimized enough to compete with apischema (and still lacking conversions, proper case handling, graphql, smart validators/etc.). And pydantic even manage to loose against pure Python (no Cython) apischema...

I haven't been able to find the time to work on apischema for a long time, and I don't know when I will resume my work on it. I barely use Python at work today as I code in Rust 99% of the time, and my company uses pydantic (yes, I'm sad, but it's also kind of funny to see how they can fight against it). I've several unpublished features/optimizations though, and I will have to do something with them. Python 3.12 may be the trigger, and the release will come soon. I will close the issue with the benchmark update of the next apischema release.

Anyway, apischema still rules.

Yes dump type checking is enabled to make sure it's actually dumping, is there another setting I should be using there?

Returning a pointer to the same mutable object is very error prone. I've seen people complain about this, so it's possibly not completely harmless.

Yes, typedload automatically uses literal fields to resolve unions quickly. It is not mandatory to add literal fields, but since in my experience most datasets use them, why not use them?

Yeah that is just to show how typedload improved, anyway in the benchmark the error is at the end of the data, otherwise typedload would be much faster of course.

Well it's substantially faster because you compare a .so to a .py? :)

My company uses both pydantic and typedload… but for me the main issue with pydantic is how basically even installing the plugin it makes mypy completely useless and it's somewhat of a return to the old days of untyped python, when you only had the errors at runtime (possibly in production).

Yes dump type checking is enabled to make sure it's actually dumping, is there another setting I should be using there?

Mypy/Pycharm/etc. should complain if you pass wrong type, so I've only added this setting for esoteric use cases. For testing, dumping can be checked by validating the output.

Returning a pointer to the same mutable object is very error prone. I've seen people complain about this, so it's possibly not completely harmless.

We don't have the same experience here, and I'm interested about hearing yours. Could you share me some complaint examples (e.g. Github issue)? I mean, there is #566, but after discussion, this more a documentation issue, as this default behavior was considered to be ok.
Again, it's the default setting for default use case, as I don't want to penalize people doing orjson.dumps(apischema.serialize(MyType, obj)), which should be the large majority of apischema users. I've actually never seen a use case with data mutability involved. Have you some examples in mind?
That being said, if you really care about performance, you should adapt your workflow to get the best of the library you use, and benchmarking falls into this category imo. And talking about performance, apischema.deserialization_method/apischema.seralization_method should be used (and I strongly advise you to add equivalent in typedload, as it's a very cheap but important optimization, as well as using functools.lru_cache for your caching system, because it's builtin)

Yes, typedload automatically uses literal fields to resolve unions quickly. It is not mandatory to add literal fields, but since in my experience most datasets use them, why not use them?

As I said, I think it's a good feature, and I'm glad you've implemented it. However, I dislike the addition of noise to data structures when it's only used for (de)serialization. I would have prefered the addition of a decorator (actually, apischema.discriminator can be used as a decorator, but it was meant to be used with an algebraic data type pattern, not a raw union) rather than an additional fields like type: Literal["foo"] = dataclasses.field(default="foo", init=False, repr=False, compare=False) formatted by black on 3 lines. But that's matter of taste. I think I need to rework a bit tagged union handling in apischema, maybe merging apischema.TaggedUnion and discriminator to do something like Rust serde, where you can choose between internally/externally/adjacently tagged, but I would not support this Literal pattern on purpose if I'm not asked for.

Well it's substantially faster because you compare a .so to a .py? :)

Indeed, but not only. Actually apischema Python code was faster before, but also a lot more complex. I've then decided to use Cython, but in a cool way, because optimized Cython code is auto-generated from Python code, so I can keep my Python code simple enough and easier to maintain – no more need of things like function(map(f, repeat(l), compress(value, ctr), repeat(t))) ;) – and the library can still work in pure Python mode, with decent performance (not at the same level than CPython-optimized libraries like mashumaro or typedload, but enough for the user who asked me explicitely to keep a pure Python version). Back in the days, I considered that using binary was cheating, but now I'm way happier to support easily dozens of features (additional properties, aliases, coerction, validators, etc.) and optimizations (no-copy, passthrough, etc.).
That's the second point, apischema can be substantially faster because of algorithmic optimizations, no-copy being a good example. An inverse example is pydantic, rewritten in Rust, but still you know what I mean. And I have to thank you because your benchmark of list[int|float] deserialization showed me that apischema is not optimized enough for this particular use case, as typedload performs better than a .so :) I will fix that for the next release.

but for me the main issue with pydantic is how basically even installing the plugin it makes mypy completely useless

That was also one of my main reason to develop apischema at that time, the other one being conversions. By the way, I'm glad you mention apischema as a viable alternative to typedload. I just hope you will change its paragraph based on this discussion, for no-copy setting and performance, and I invite you to highlight the fact that it is a .so ;) Also, about settings, they can be global for convenience, but they can also be passed directly as parameters to (de)serialization methods, overriding global defaults.

Well your user found out that it was passing lists instead of copying, probably the hard way.

You might have a data structure that you save on a signal to a file, to keep the state, if this procedure does pop() or similar on the data while writing, it will also live modify your supposedly unchanged state data structure.

It is completely weird to do this in python or any language, if you mix copy and reference you must really make sure it's plastered all over your documentation because nobody is going to expect it, and they will hate you for it when they take days to debug why sometimes occasionally their data gets corrupt.

You might not like unions and literals, but the reality is that they are there. For example localslackirc sits in a loop receiving json and deciding which class it should be. https://github.com/ltworf/localslackirc/blob/master/slack.py#L310C3-L310C3

I developed typedload because there was nothing else existing at the time. No pydantic and certainly no apischema. Typedload worked on 3.6, before types were even an official feature of the language and were changing between minor releases, and from the beginning, dealing with unions was a major concern, and back then there was no literal type to help :D

You might have a data structure that you save on a signal to a file, to keep the state, if this procedure does pop() or similar on the data while writing, it will also live modify your supposedly unchanged state data structure.

If the state is save in a file, it has to be serialized to binary, that's where the copy happen, apischema serialization result being a temporary value. However, if the binary serialization happens in a thread, there could be an issue, you're right.

It is completely weird to do this in python or any language

I don't think it's so much weird, because apischema serialization is not a true serialization, it's meant to be temporary result before the real binary serialization. I realize writing this last sentence that maybe (de)serialization is not the right name for apischema method. I'm still convinced this is the right behavior 95% of the time, but you start convicing me that it would be better as an optional optimization, which should be "plastered all over your documentation", because every user should know about it to use it.

You might not like unions and literals

I don't dislike them, it's just I didn't really like having a field in my class that will never be used anywhere in my code. Your linked code illustrates this when you write MessageBot(type='message', subtype='bot_message',...); it can add unecessary noise to the code non-related to serialization. By the way, Literal fields forces you to use your own defined classes, it can not extends arbitrary one (but I assume this use case should be quite rare, and you can still use flattened fields). But you are starting to convince me (again) about this use of literal fields, so I think I will have to add it in apischema with my discriminator rework. Your implementation is especially elegant with sub-discriminator pattern, congrats.

Typedload worked on 3.6

The dark ages of Python ... :D I remember discovering the beha
vior of get_origin when I added the support of 3.6. Anyway, I know you're more experienced, and I like this discussion because it makes me realize a bunch of mistakes I've made, like global settings. I've only used my own library a few months two years ago (never used the GraphQL part myself), so I did most of the development with the only motivation to give happiness to the developers frustrated by pydantic; that's also why I don't have enough use cases in mind.

I had more signals in mind, but yeah also threads are kinda the same in the end. Also with async tasks the same problem could arise. I agree it's very rare but it's the kind of thing that is very unexpected and with interleaving it becomes very hard to track.

In theory the data structures should be different enough from each other that there should be no need for a field to say which class it is, but at work in the original use case we had ~150 classes that are discriminated via a type field. If you add default fields, it becomes a roulette to obtain the intended type back :D

I didn't think of performances at all when I started to write typedload, I just thought it'd save me some effort and it was a cool project. I first started to think about performances in version 2, when I had to slightly break API to disallow changing handlers after the initial setting up because they get memoized. My original design goal was to make it very very flexible so that I wouldn't need to add a special case per every strange type that people wanted to use.

I had done it for personal use but I got asked to make an exception and allow LGPL at work so that it could be used in the product proper as well, besides in internal stuff.

In the end making it go fast for me is more of a fun challenge than a need to achieve anything. It doesn't change much because most downloads are from the veeeeery slow version 1 sadly (https://www.pepy.tech/projects/typedload?versions=2.*&versions=1.*). I don't know who these people are and how to tell them to use a recent version.

using cython doesn't seem to be very trivial! At least for me.

like global settings

hehehe, well if there is a way to set them un-globally you could set a deprecation and move on with your life, in the end for now apischema

I had a clearer idea with typedload because it's actually the 2nd such library that I implemented, so I had at least already done some mistakes that I'd want to avoid in the 2nd iteration.

I don't know if I'd keep maintaining it if I wasn't using it myself. Probably I'd keep at it but more like maintenance mode or so. At least in python they seem to have calmed down with all continuous changes to the types. Now new versions of python typically just work with no changes.

I use it for localslackirc and vasttrafik-cli, both projects that i use daily myself (that have absolutely no high performance requirements)