msgpack / msgpack

MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.

Home Page:http://msgpack.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Records and references

clwi opened this issue · comments

One weakness of MessagePack is the inability to group values in other ways than in arrays and maps. This has been pointed out in several issues before e.g. in #253 and #243. Today the different implementations are solving this with a (mostly undocumented) layer on top of MessagePack. This could be ok, but today there is no common standard on how to do it. Result: No one can easily communicate with another system.

Two data modelling concepts are so strong that they have built-in support in almost all programming languages. They are the concepts of reference (pointer) and record (struct, class). We should have a standard on how to represent those in a MessagePack stream.

To include new concepts in MessagePack, at least 3 requirements must be met:

  • Backward compatible
  • Simple implementation
  • Sufficient concept strength

I consider the last requirement is met by references and records. The first two requirements can also be met, which is shown in the implementation CWPack. In the Objective-C interface, both concepts are solved with a single fixext item with an integer payload. Details can be found in the document Technique.

My suggestion is that we reserve extension type -2 as a record/reference marker.

I don't think there remains much to win if you leave the first bullet point there. If it has to be backward compatible, then you are adding more metadata instead of removing redundant type data.

I think an object (representation of C++ class/struct) should be one of the core MessagePack data formats. Indeed, it is to strange why there is no «object format family»? MessagePack declared as JSON like, but one of the core concepts is absent

Yes, that concept is usually called a "schema". We do not need the inheritance features but just a "packed struct" definition for complex (or rather compound) types, that can be referenced or nested in larger types.
The schema should be able to go into a header or be separated from the message.
It seems that this was all outside the scope of MsgPack design, just like JSON that does not have these concepts.

I'm not sure I 100% understand what is being asked for here, but I think my proposal in #330 may at least partially help with your "record" storage using the pre-defined map extension type I proposed.

Also, the deduplication array/reference extension types would allow to store an object once and then referring to it multiple times from different places.

At a data level, it's only backward compatible in the sense that it does not make any changes to the underlying msgpack specification and only defines additional pre-defined types, but as was pointed out above, anything which actually removes redundant data instead of adding more metadata can not be completely backward compatible.

At an API level, it can be made 100% backwards compatible, that is, data serialized using my proposed extensions will look identical at API level as data serialized (less efficiently) without it.