kriszyp / msgpackr

Ultra-fast MessagePack implementation with extension for record and structural cloning / msgpack.org[JavaScript/NodeJS]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fast opt-out of shared structures per object?

somebee opened this issue · comments

We are packing tons of objects into a sequential stream. Most objects follow fairly consistent structure {id,name,...keys}, but within these objects there are several objects that I suspect does not benefit much from using structure definitions, like {someId:1,someOtherId:2} where many of them don't show up more than once among 10k+ objects.

I see the option shouldShareStructures but from reading the source I don't think it fits our need. I'm wondering if it might make sense to be able to pass in a function like shouldUseStructures(value) which can return false if we want to pack an object as a plain old regular object?

Our function would be as simple as (value)=>!!value.id. Any object without an id should skip all the logic testing for shared structures, key combinations and all that. I can definitely add it, but was wondering if you think it would make a difference at all? I will try to hardcode it in locally and do some very informal tests here :)

Fwiw, skipping structures for these objects reduced the (compressed) size of the full stream from 500kb to 470kb, and the uncompressed by ~200kb. Unpacking performance seems about the same, but hard to say since it's so damn fast either way. Intuitively I would think that packing perf is faster as well, but haven't made an isolated case to test it.

Made an isolated test with the real-world data we have.

useRecords(fn) – 1.35mb – pack: 10.2498ms unpack: 9.3448ms
useRecords – 1.46mb – pack: 11.4979ms unpack: 15.3999ms

So, in our usecase it makes a quite substantial difference actually. I'll submit a PR today where the only public-facing change is that you can supply useRecords as either a function or a boolean. If it is a function, it will essentially call useRecords(value) for each value, and opt out to writePlainObject if it returns false. If you set useRecords to true/false no code-paths will change, so there is no performance impact for any other cases.

I assume it wouldn't make viable to use a Map for the objects that are... maps :) (without consistent structure).