add type ByteString for backwards compat with IPLD schema

Question

add type ByteString for backwards compat with IPLD schema

petar opened this issue 2 years ago · comments

Edelweiss type String definitionally holds valid Unicode strings only, and encodes/decodes them from IPLD strings with valid UTF8 encodings.

However, there may be pre-existing IPLD schemas that place non-UTF8 byte sequences in IPLD string objects.
By design requirement, Edelweiss must provide a way for working with pre-existing schemas.

One way of doing this without violating Edelweiss's type semantics is to introduce a new Edelweiss type, say called ByteString, which:

is a list of bytes on the user-facing end
encodes/decodes as an IPLD string of arbitrary bytes on the wire

Volker Mische · Answer 1 · Fri Mar 11 2022 23:48:59 GMT+0800 (China Standard Time)

Is this a general IPLD Schema problem (then it should really be fixed) or an Go Schema implementation detail?

Petar Maymounkov · Answer 2 · Mon Mar 21 2022 22:37:52 GMT+0800 (China Standard Time)

Is this a general IPLD Schema problem (then it should really be fixed) or an Go Schema implementation detail?

@vmx I've updated the description. Maybe it was a bit confusing previously.

by the way, there is a new set of slides that documents Edelweiss at its current state (Milestone 1): https://github.com/ipld/edelweiss/tree/main/doc/slides
this may be helpful too.

Volker Mische · Answer 3 · Tue Mar 22 2022 00:59:59 GMT+0800 (China Standard Time)

encodes/decodes as an IPLD string of arbitrary bytes on the wire

I guess one major serialization will be CBOR. Then this won't work. In CBOR strings need to be valid UTF-8, else it's invalid, non-spec compliant CBOR.

Petar Maymounkov · Answer 4 · Tue Mar 22 2022 01:13:42 GMT+0800 (China Standard Time)

If this is the case, this suggests a design bug in the IPLD data model: if the IPLD data model allows arbitrary bytes in a string (which I believe it does), then this breaks the contract that IPLD values can be serialized to any backend (e.g. both DAGJSON and DAGCBOR).

Volker Mische · Answer 5 · Tue Mar 22 2022 01:29:06 GMT+0800 (China Standard Time)

The IPLD data model is independent of the serialization, so potentially there could be serializations that support that, we currently just don't have any of those serialization formats.