multiple formats for same int value allowed?

Question

multiple formats for same int value allowed?

hyansuper opened this issue 2 years ago · comments

should an integer value of 1 be stored as 0x01 or 0xcc 0x01 or 0xd0 0x01 , or even 0xcd 0x00 0x01? are they all valid?

hyan commented 2 years ago

thanks

Markus Schaber · Answer 1 · Tue Aug 09 2022 21:03:59 GMT+0800 (China Standard Time)

Technically, all three formats are valid, among other encodings, like `0xcf 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01'. A conformant decoder should parse all representations equally.

But the spec says:

If an object can be represented in multiple possible output formats, serializers SHOULD use the format which represents the data in the smallest number of bytes.

This means 0x01 is the recommended encoding for most use cases, as conciseness is a design goal of MessagePack.

But the spec also offers the idea of "Profiles", which are special agreements between sender and receiver about a subset of MessagePack (possibly with added semantics). One example may be ressource restricted environments, where an encoder may be kept simple by always using the same, fixed-size encoding, or the decoder does not support 64 bit values.

A compliant, universal decoder should always be able to decode subsets of MessagePack, but when the receiving end is limited, the sender needs to take special care to choose the format the receiver can interpret.

hyan · Answer 2 · Tue Aug 09 2022 21:15:58 GMT+0800 (China Standard Time)

Ok, so 256 can be stored as 0xcd 0x01 0x00 or 0xd1 0x01 0x00, and they are of same length, is one preferred over the other?

Markus Schaber · Answer 3 · Tue Aug 09 2022 21:22:03 GMT+0800 (China Standard Time)

The spec does not define a canonical encoding in those cases.

I'd recommend: if you have signed/unsigned types in your language (like in C, C#, Rust, Go or the IEC 61131-3 family of languages), you should choose the corresponding encoding. (This implies always using the signed types in Java, except for char. 😄)

You could also argue like this: If the value should never be negative due to the semantics (e. G. a length or a page number), prefer 0xcd. If the value can be negative (e. G. a relative position, or a coordinate), prefer 0xd1.

A compliant decoder should be able to parse both values equally.