google / trillian

A transparent, highly scalable and cryptographically verifiable data store.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider adding GetLeavesByHash back?

dlorenc opened this issue · comments

Ref: #2243

We were using this over in Rekor/sigstore. We can work around it by getting an inclusion proof for each entry and then getting by that index, but it's a bit awkward.

Hey @dlorenc, @lukehinds,

The GetLeavesByHash call is kinda unusual (and so largely unused) because of this, but has a cost in terms in code complexity so we've been looking at trying to pare the API surface down.

It'd be really useful to understand a bit more of the use case for the GetLogEntryByUUID call - it looks like UUID in this case is really the Merkle hash of the entry (which I think also contains a signature?), so it seems that you'd only be able to calculate this ID if you already had the leaf preimage (i.e. the manifest+sig)?

Right - for our case the leafhash is the triplet of:

  • Signature Bytes
  • Public Key Bytes
  • Artifact Hash Bytes

Those three are typically distributed together as part of a software release (well, technically the public key is usually somewhere else for security reasons), but end users must somehow gain access to all three of these to verify a release. We'd like to be able to lookup an entry in the log by this triplet as well, without requiring the user to know other information (like the log index).

I think our workaround is to request an inclusion proof instead of the entry instead in this case, would that be correct? I think that's probably largely equivalent and might meet our use case.

Cool, thanks!
I'd like to try to understand a bit more about the client side of things, I'll make some stuff up and you can correct me if I go way off piste :)

A user gets a binary package they want to install, inside the package they get:

  • the binary(ies)
  • sig(artifact) bytes
  • pubk bytes (probably they already have this)
  • H(artifact) bytes

Artifact == manifest/metadata relating to binary, and has a field which commits to H(binary)?

So as the user I want to verify that the package I just downloaded is self-consistent so I need the full artifact so I can:

  • verify the binary hash is committed to by the artifact
  • verify the artifact hash is correct for the artifact
  • verify the sig is correct for the artifact hash, and so transitively commits to the binary I just downloaded

Once I've done that, I suspect that I probably also want to verify inclusion of this artifact in the log (because I only want to install things that I know are discoverable by monitors), is that right?

Roughly correct, a couple nits that might or might not help clarify things:

  • the binary(ies)
  • sig(artifact) bytes
  • pubk bytes (probably they already have this)
  • H(artifact) bytes

I generally use artifact/binary interchangeably. There's no real standard or distinction here, I should get better about that. Some people distribute tar/zip archives containing everything, some distribute just a raw binary with a signature next to it. More accurate would probably beL

  • the Artifact
  • sig(Artifact) bytes
  • pubk bytes (probably they already have this)

Where "capital A" Artifact is just some generic blob. It could be a text file, binary, archive, text file containing hashes of one of more archives, anything.

Then to push that through to your second list:

So as the user I want to verify that the package I just downloaded is self-consistent so I need the three components so I can:

  • verify the sig is correct for the artifact and the public key they expect.

They can calculate the hash locally. They may obtain the hash from the website package manager too to verify integrity of the download, but I think we can ignore that for here.

Then after making sure the signature is correct and valid, they may want to verify inclusion in the log, for the reason you outlined. We're looking at ways to "staple" this proof into artifacts too, following the OCSP-stapling/pre-certificate models, to avoid the need for this lookup. In some cases (where artifacts have well-known mechanisms of attaching extra content) we may be able to do this, but we can't rely on it for the general case and have to assume not all formats will allow for this attachment.

Great, thanks again!

Some more questions :)

How does the client get the UUID for the log entry?
When would it call GetLeavesByHash, and what would it do with the record it gets in return?
Would it ever want to do that fetch of the record and not get an inclusion proof?

(Is that record one of these BTW?)

Yeah, I think you've now convinced me that it would only ever make sense to get the entire thing, including the inclusion proof. For any cases where you only have the UUID/hash, you actually want to verify the complete inclusion. For any cases where we hand you an identifier to find an entry later, we can just use an Index.

Cool 👍

More than happy to chat through any of this stuff if/as/when it's helpful!