ChainAgnostic / varsig

The cryptographic signature multifomat

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Capturing data encoding in the multiformat

Gozala opened this issue · comments

Originally this came up here ucan-wg/ucan-cacao#2 (comment), but it's probably best to continue discussion here. Here is a short summary:

Current version of varsig is specified as follows

<varint sig_alg_code><vairint sig_size><bytes sig_output>

However since CACAO signs CBOR payload as opposed to JWT payload somehow we need to communicate what is the encoding of the payload in the signature.

One option was to simply expand list of sig_alg_codes to accommodate more payload formats. However it would imply allocating signature codes for each signature algorithm per each encoding. It also implies that if I have created a new codec not only I have to get a new multiformat code for the IPLD encoding, I also need to get set of codes for signature algorithms which is not great.

For this reason I think we should change format to following instead

<varint sig_alg_code><varint payload_encoding><vairint sig_size><bytes sig_output>

Pulling @expede and @oed into this

<varint sig_alg_code><varint payload_encoding><vairint sig_size><bytes sig_output>

Agreed! Let's do it 💪

Okay, so I recognize that I've been a champion for the above previously, but I'm going to be annoying and give the devil's advocate view:

  1. Should it include the hash function?
  2. Is tracking the encoding the responsibility of the payload?

Hash Function

RS256 is RSA + SAH256. ECDSA is usually SHA256, but doesn't have to be. We could separate these out into separate fields...

<varint sig_alg_code><varint sig_hash><varint payload_encoding><vairint sig_size><bytes sig_output>
                     ^^^^^^^^^^^^^^^^^

...which is yet one more field / a byte or two extra.

Encoding

I started writing text here, but convinced myself that including the multicodec of the payload does make sense here. If you're signing e.g. a non-canonicalized JWT, you just signal it as 0x00 raw bytes in the signature.

Should it include the hash function?

Do all signature algorithms have hashing functions ? If some don’t then question would arise what to do with those. Perhaps “identity” code would do the trick there.

I’m warming up to this idea, in fact we could simply reuse multihash and have format like

<varint sig_alg><varint payload_encoding><multihash>

If you're signing e.g. a non-canonicalized JWT, you just signal it as 0x00 raw bytes in the signature.

I would argue that we need a JWT multicodec code for that, because raw usually implies something else.

P.S.: 0x00 is identity multihash code, 0x55 is raw binary code

I’m warming up to this idea, in fact we could simply reuse multihash and have format like

This is a bit backwards I think. The hashing function is what is used over the canonicalized payload. The signature itself is not a hash, so I don't think we can use multihash here.

I started writing text here, but convinced myself that including the multicodec of the payload does make sense here. If you're signing e.g. a non-canonicalized JWT, you just signal it as 0x00 raw bytes in the signature.

I assume we need a canonicalization alg that describes how you take the payload and encode it as a JWT? If you just have the bytes of a raw JWT string that also needs to be signaled somehow? I guess it depends on how the data structure looks like where you get the JWT string and the signature?

btw, I'd prefer if we call it payload_canonicalization rather than payload_encoding.

I assume we need a canonicalization alg that describes how you take the payload and encode it as a JWT? If you just have the bytes of a raw JWT string that also needs to be signaled somehow? I guess it depends on how the data structure looks like where you get the JWT string and the signature?

I mean this is from data model (of certain schema) to bytes. Which is why I call it encoding, it is a same code as in cid of the data.

btw, I'd prefer if we call it payload_canonicalization rather than payload_encoding.

but I want to use e.g. dag-cbor or dag-json depending on how you’ve encoded model to bytes before signing. Perhaps you’re saying canonicalization is yet another param ?

@mikeal suggested that instead of <varint payload_encoding> we use <multiformat payload_encoding> instead. In common cases it could be just single varint but it also provides a way to include other canonicalization details in specific instances.

I mean this is from data model (of certain schema) to bytes. Which is why I call it encoding, it is a same code as in cid of the data.

No this is not at all what I mean. Why would you need to include which IPLD encoding you are using? I assume you get this from the CID when you load and interpret the IPLD block?

We need a varint that represents how to go from ipld object -> serialized data to sign

For example:

  • ipld object -> JWT protected header + payload
  • ipld object -> SIWE message

Basically we need to know how to go from ipld data to the bytestring used to verify the signature.

but I want to use e.g. dag-cbor or dag-json depending on how you’ve encoded model to bytes before signing. Perhaps you’re saying canonicalization is yet another param ?

I don't see why this wouln't just be part of the canonicalization alg?

@oed we mean same thing just use different terms. IPLD codec literally takes data and turns it into bytes

@Gozala but we don't sign over IPLD encoded data. We sign over JWT data or SIWE messages.

You can think of both as IPLD encoders and this came up in other context, where it literally is either dag-cbor or dag-json.

p.s.: I don’t care what we call it

@Gozala I don't really follow. In the case of SIWE we have a bunch of data in various fields of the IPLD object. These are the steps I'm thinking about:

  1. CID -> bytes from blockstore or network
  2. bytes -> IPLD object using the IPLD codec
  3. IPLD object -> SIWE message and signature (this step is what I call canonicalization)
  4. Verify that signature is correct over SIWE message bytes

The other way around:

  1. Generate SIWE message and sign it (signature)
  2. SIWE message and signature -> IPLD object (canonicalization)
  3. IPLD object -> bytes (IPLD codec)
  4. hash(byes) -> CID
  1. CID -> bytes from blockstore or network
  2. bytes -> IPLD object using the IPLD codec
  3. IPLD object -> SIWE message and signature (this step is what I call canonicalization)
  4. Verify that signature is correct over SIWE message bytes

These are definition of the IPLD encoder / decoder :

export interface BlockEncoder<Code extends number, T> {
  name: string
  code: Code
  encode: (data: T) => ByteView<T>
}


/**
 * IPLD decoder part of the codec.
 */
export interface BlockDecoder<Code extends number, T> {
  code: Code
  decode: (bytes: ByteView<T>) => T
}

So your steps are

  1. Fetch bytes for CID
  2. Codec.decode(bytes)
  3. SIWECodec.encode(bytes)
  4. PubKey.verify(SIWECodec.encode(bytes))

You are serializing some data into bytes in some format, which is what IPLD encoder is.

Ok I see what you are saying now @Gozala. Thanks for clarifying!

Interestingly your example above is super clear for a JWT where the signature is part of the encoded message. For SIWE this is not the case. We will have the signed string separately from the signature bytes. There is no official way to encode these two together.

we could do something like this though:

  1. Fetch bytes for CID
  2. decoded = Codec.decode(bytes)
  3. siweStr = SIWECodec.encode(decoded)
  4. PubKey.verify(siweStr, decoded.signature)

@oed oh yea sorry I forgot to add actual signature into verify, because they're separate in JWT cases as well

@Gozala In JWTs they are not separate?
A JWT should be a string like this:
<base64url-protected-header>.<base64url-payload>.<base64url-signature>

@Gozala In JWTs they are not separate?
A JWT should be a string like this:
<base64url-protected-header>.<base64url-payload>.<base64url-signature>

I mean it is, but you still pass first two segments as a payload and third as signature.

Pretty sure it differs per implementation. Most that I've seen you just pass the entire JWT string.

True for both of these:

Trying to figure out how to represent a DagJOSE (JWS) as a varsig.

We have,

<varint sig_alg_code><varint payload_encoding><vairint sig_size><bytes sig_output>

Naive approach would be:

  • sig_alg_code: 0xd0ed
  • payload_encoding: 0x85

However, this doesn't really cut it since dag-jose only says how to go from a JWS-string to dag-jose bytes, not some arbitrary structure to bytes.

So it seems like we will need to register a new payload_encoding for every possible payload we have?

For example we would need to define:

  • a SIWE codec that is only usable for the way we represent SIWE + signature on ipld
  • a UCAN codec that is only usable for the way we represent UCANs in ipld

This means that we also need a specific codec for invocations as well?

Maybe I'm missing something here?