Clearly document the mediaType of in-toto attestation layers

Question

Clearly document the mediaType of in-toto attestation layers

marcelamelara opened this issue 10 months ago · comments

A recent issue on the SLSA side mentions a use case for exposing a mediaType for the predicate layer at the Statement and/or DSSE layer: slsa-framework/slsa#933 . The OP of the issue @sudo-bmitch also asked about ways to specify the mediaType of a DSSE itself to identify cases in which in-toto attestations are not wrapped in a DSSE.

A suggestion given by @MarkLodato at today's SLSA spec meeting for exposing predicate information at the DSSE layer uses parameters: "payloadType": "application/vnd.in-toto+json;predicateType=<TYPE URI>". Verifiers who don't care about the predicateType at the DSSE layer can simply skip the predicateType parameter in the field.

The question of how to specify the mediaType of a DSSE object is also relevant for evidence in SCAI predicates (see example), though I don't know if there are other use cases for needing to specify the DSSE mediaType.

Either way, if this is a use case we want to support, we should at the very least document an official/standard format for producers who want to indicate the predicate type at the DSSE layer. It might even be worthwhile to register the application/vnd.in-toto mediaType to officially standardize the Statement layer mediaType.

Marcela Melara · Answer 1 · Tue Aug 01 2023 05:09:27 GMT+0800 (China Standard Time)

@sudo-bmitch, please feel free to clarify any additional points or requirements I may have missed.

Mark Lodato · Answer 2 · Tue Aug 01 2023 05:13:03 GMT+0800 (China Standard Time)

Note: I think we'd add .dsse before the +json specifically for DSSE envelopes, and maybe a different extension for envelope-less?

Marcela Melara · Answer 3 · Tue Aug 01 2023 05:17:33 GMT+0800 (China Standard Time)

I think that can work. Right now, I believe that the application/vnd.in-toto type is supposed to denote a Statement payload in the DSSE, but DSSE may contain other payload types as well.

I wonder if application/vnd.dsse.in-toto+json would be the clearer way to express that it's a DSSE with an in-toto Statement payload. That way, other payload types in the DSSE can be described as vnd.dsse.<payload>?

EDIT: I don't have particularly strong feelings either way right now. I think arguments can be made for both .in-toto.dsse and .dsse.in-toto.

Aditya Sirish · Answer 4 · Tue Aug 01 2023 22:47:14 GMT+0800 (China Standard Time)

Are there some examples / expectations on how a media type that includes predicate information would be parsed? I understand the motivating use case is OCI but what's the benefit of having predicate information as opposed to application/vnd.dsse+json alone?

Tom Hennen · Answer 5 · Tue Aug 01 2023 23:40:25 GMT+0800 (China Standard Time)

I think a related question might be, what media type to use if the data is an in-toto bundle and containers many predicates? What about if that bundle includes things that aren't dsse?

Maybe we don't need to resolve that now?

Marcela Melara · Answer 6 · Wed Aug 02 2023 00:27:44 GMT+0800 (China Standard Time)

I'm a little concerned we're starting to conflate the mediaType of the "transport" (i.e., DSSE) with the mediaType of the DSSE/transport payload. Assuming it's a DSSE envelope with an in-toto Statement payload, the payloadType should still just be vnd.in-toto+json maybe with a predicate hint, if that's what the producer wants. As @adityasaky pointed out in a separate conversation, there is a risk of verifiers taking the predicate information at the DSSE level at face value and not actually validating the in-toto attestation.

The type of the DSSE itself should probably be something like vnd.dsse+json if we wanted the DSSE to be self-describing with a _type field, not unlike the in-toto Statement currently is.

To @TomHennen's question about bundles: I think that each DSSE could indicate its predicate type as suggested above, if need be. If there are non-DSSE lines, I wonder if in-toto can require a _type field in each line. wdyt?

Brandon Mitchell · Answer 7 · Wed Aug 02 2023 09:01:07 GMT+0800 (China Standard Time)

The use case I'm looking at is associating metadata with container images. We have metadata like a CycloneDX SBOM, an SPDX SBOM, and a sigstore signature. Other potential data I've been thinking of includes the license, source code or the build context for license compliance, and dependencies needed to perform a reproducible build.

I want to include SLSA provenance to that list of metadata associated with the container images, and to do that, the best hammer we have to differentiate the SLSA provenance from all the other metadata, and allow tooling to quickly locate the metadata it's looking for, is the mediaType. Sigstore doesn't want to download a bunch of SBOMs to find a signature to verify. And a tool to verify SLSA provenance doesn't want to try validating a license file.

For the value of the mediaType, we're limited to the naming requirements in RFC6838 section 4.2, which I believe is going to exclude the predicateType syntax suggested above.

Personally, it feels wrong to specify this as in-toto or dsse mediaType that happens to contain a SLSA predicate. All the other tools and specs I've been working with specify their name followed by the format they happen to use in the mediaType. E.g. it's application/spdx+json and not application/json;predicateType=spdx.

John Kjell · Answer 8 · Wed Aug 02 2023 09:41:11 GMT+0800 (China Standard Time)

The next section (4.3) lays out the parameter requirements. I'm not sure why the above proposal wouldn't be acceptable (acceptable being different than right/best). Plain text seems to be the most common example of a media type with parameters: text/plain;charset=us-ascii.

Brandon Mitchell · Answer 9 · Wed Aug 02 2023 10:42:38 GMT+0800 (China Standard Time)

OCI is looking for just the mediaType and not the additional parameters that may follow. The artifactType field calls out section 4.2, that validation is in the json schema, and I expect it's also being added to clients and servers.

The server side filtering is also an exact string match. If we tried supporting extra parameters, that would also break filtering since servers would need to do a complex comparison (artifacts could be uploaded with all sorts of extra parameters that need to be ignored in many cases). Just getting registry operators to consider supporting a simple string comparison on a single field has been a complex negotiation.

John Kjell · Answer 10 · Wed Aug 02 2023 21:34:34 GMT+0800 (China Standard Time)

So, should there be a recommendation for each attestation predicate to use a unique media type during transport? Perhaps with a recommended format along the lines of application/in-toto.<predicateType>+json?

Can we just ignore/assume a DSSE envelope as it appears required by the spec?

For the purposes of associating attestations to an OCI image/artifact, bundles do seem problematic. Maybe explicitly proposing the use of an image manifest to bundle attestations could be useful. Not sure where the appropriate place for such a recommendation is (here vs OCI).

Brandon Mitchell · Answer 11 · Wed Aug 02 2023 21:53:33 GMT+0800 (China Standard Time)

Would the DSSE requirement exclude SLSA level 1 attestations, like those provided by Docker's buildkit?

Mark Lodato · Answer 12 · Wed Aug 02 2023 23:20:15 GMT+0800 (China Standard Time)

Sorry, I don't understand the problem with a mediaType containing a parameter. It complies fully with RFC 6838 and filtering would still be an exact string match. I don't see how it's any different than application/in-toto.<predicateType>+json. Do you have evidence that an existing registry or tool disallows parameters, e.g. text/plain;charset=us-ascii?

I'm not opposed to defining a mediaType specifically for SLSA Provenance, but want to better understand the constraints.

Brandon Mitchell · Answer 13 · Thu Aug 03 2023 00:51:04 GMT+0800 (China Standard Time)

Here's the JSON spec in OCI for the media type that excludes parameters: https://github.com/opencontainers/image-spec/blob/main/schema/defs-descriptor.json#L7

Here's an example for how distribution/distribution wouldn't handle a Content-Type header that includes additional parameters because the media type comparisons are exact matches without stripping off parameters: https://github.com/distribution/distribution/blob/7b502560cad43970472964166dcb095b1f883ae4/registry/storage/ocimanifesthandler.go#L90

With using exact string matches, that would mean application/json; charset=utf-8 and application/json would not match, even though all text based context in OCI is required to be utf-8. Since these mediaTypes are potentially coming from end users, there's a strong possibility that the option to define parameters would result in users adding all sorts of parameters, either to their query or to the uploaded artifacts.

So for our use case, having a SLSA specific mediaType without depending on parameters that's defined by the SLSA project would be useful. It doesn't need to be formally registered with IANA (though that helps). Getting this defined by SLSA would avoid each end user creating their own values, breaking the ability to have interoperability in automation.

Marcela Melara · Answer 14 · Thu Aug 03 2023 08:38:51 GMT+0800 (China Standard Time)

Would the DSSE requirement exclude SLSA level 1 attestations, like those provided by Docker's buildkit?

It probably would since SLSA Build L1 doesn't require the provenance to be signed. To clarify, buildkit only generates a pure SLSA Provenance document, i.e., it doesn't generate any authenticated data?

In general, I'm a little skeptical about defining a mediaType for the lowest SLSA level when the provenance for higher SLSA Build levels may/should be wrapped in an in-toto Statement+DSSE transport (per the recommended attestation suite). So, we'd need to give tools a way to distinguish between SLSA+in-toto+DSSE and pure SLSA Provenance documents either way. Without knowing what the implementation challenges are, my personal take is that this seems like an opportunity to move buildkit to SLSA Build L2 :)

So, to John's point:

should there be a recommendation for each attestation predicate to use a unique media type during transport?

I agree that for use cases that need predicate info at the DSSE layer, in-toto could define a mediaType like application/vnd.in-toto.<predicateType>+json. But I'm rather wary of having each predicate type define a completely custom mediaType for use with in-toto because that extra mapping between predicateType and mediaType leaves more room for potential errors/discrepancies in implementations.

Now, there's a 64-char limit for application mediaTypes, so we couldn't just stick the entire URI into the <predicateType> in the mediaType. If we take this route, we'd probably need to document a standard/expected way for in-toto implementers to truncate the predicateType URI to fit in the mediaType.

To summarize, my takeaways from this whole discussion are the following:

in-toto should support use cases that need predicate info in the DSSE payloadType (which is a mediaType field) in a backwards compatible way. We should clearly document what the format is to do so for in-toto implementers, including examples.
We should probably document explicitly that predicates used without in-toto/DSSE do not have the same integrity guarantees since they are not authenticated.
Whether or not the SLSA maintainers decide to define an independent SLSA mediaType, the SLSA documentation may need to be updated according to any spec changes we make on the in-toto side.

Brandon Mitchell · Answer 15 · Thu Aug 03 2023 10:14:15 GMT+0800 (China Standard Time)

Without knowing what the implementation challenges are, my personal take is that this seems like an opportunity to move buildkit to SLSA Build L2 :)

I'm not comfortable responding to implementations that start their SLSA journey with a "go away and come back when you do better" response, which is how a "we don't support L1, please upgrade to L2" would be interpreted. I'm also not convinced that buildkit or any other container image build tooling needs to solve directly in their tooling.

To compare file sizes, buildkit is a client/server app with a 25MB client and a 50MB server. Cosign is a ~100MB client. I don't think adding a signing tool to buildkit is a trivial ask.

I think the ecosystem is better when we have separate tools for separate tasks, and signing an attestation could easily be broken out to a separate task. Does SLSA have a tool for signing attestations? Preferably something that solves the key management issues in a way that end users will find easy to adopt on the desktops, servers, and CI pipelines. If not, without knowing what the implementation challenges are, this seems like an opportunity for SLSA to create one. :)

Marcela Melara · Answer 16 · Thu Aug 03 2023 22:56:13 GMT+0800 (China Standard Time)

I'm not comfortable responding to implementations that start their SLSA journey with a "go away and come back when you do better" response, which is how a "we don't support L1, please upgrade to L2" would be interpreted. I'm also not convinced that buildkit or any other container image build tooling needs to solve directly in their tooling.

Thanks for your feedback @sudo-bmitch. I do agree that tools/ecosystems beginning their SLSA journey should not be discouraged from incrementally improving their security levels, and the SLSA community could provide clearer guidance on scope for different ecosystems, and the tools to use to implement various requirements.

I think the ecosystem is better when we have separate tools for separate tasks, and signing an attestation could easily be broken out to a separate task. Does SLSA have a tool for signing attestations?

in-toto IS one such tool for generating signed attestations, without relying on Sigstore, which SLSA describes in its recommended attestation suite. I encourage you to explore the in-toto implementations, that provide attestation generation, signing, and policy checking features.

Preferably something that solves the key management issues in a way that end users will find easy to adopt on the desktops, servers, and CI pipelines. If not, without knowing what the implementation challenges are, this seems like an opportunity for SLSA to create one. :)

These are all great points to discuss with the rest of the SLSA community.

Marcela Melara · Answer 17 · Thu Aug 03 2023 23:01:18 GMT+0800 (China Standard Time)

@in-toto/attestation-maintainers At the end of the day, where I think in-toto can help address this issue, and where I would like to get consensus on is: Given that there are use cases for providing additional predicate information without unpacking the DSSE, would any of the above proposals for DSSE payloadType (mediaType) be sufficient?

Aditya Sirish · Answer 18 · Fri Aug 04 2023 02:36:41 GMT+0800 (China Standard Time)

Personally, it feels wrong to specify this as in-toto or dsse mediaType that happens to contain a SLSA predicate. All the other tools and specs I've been working with specify their name followed by the format they happen to use in the mediaType. E.g. it's application/spdx+json and not application/json;predicateType=spdx.

Correct me if I'm wrong but in a future where today's SLSA-specific verification is expanded to encompass verifying other in-toto attestations as well, wouldn't we want the generic in-toto only type? i.e., we'd want the client to get all the in-toto metadata for verification, one of which is the SLSA provenance. In the meantime if SLSA Provenance is the only metadata emitted as an in-toto attestation, we also have the clean identification of that specific file.

In fact, if we have multiple metadata files with the requirements in the upcoming SLSA source track and so on, we'd probably store them as a single attestation bundle anyway, so we'd perhaps again lose the benefit of having a SLSA specific media type?

Brandon Mitchell · Answer 19 · Fri Aug 04 2023 05:20:43 GMT+0800 (China Standard Time)

I think the ecosystem is better when we have separate tools for separate tasks, and signing an attestation could easily be broken out to a separate task. Does SLSA have a tool for signing attestations?

in-toto IS one such tool for generating signed attestations, without relying on Sigstore, which SLSA describes in its recommended attestation suite. I encourage you to explore the in-toto implementations, that provide attestation generation, signing, and policy checking features.

It's been a while since I was working on the in-toto-golang repo, but I'm not seeing how to sign the SLSA attestation with it:

$ in-toto sign -f slsa-edge-amd64.json -k cosign.key -o slsa-edge-amd64.signed
failed to load layout at slsa-edge-amd64.json: In-toto metadata requires 'signed' and 'signatures' parts

I haven't worked with that part of the code for this kind of task before, mostly I was signing a layout and using in-toto to run and attest the command, not to sign an attestation generated by someone else.

Preferably something that solves the key management issues in a way that end users will find easy to adopt on the desktops, servers, and CI pipelines. If not, without knowing what the implementation challenges are, this seems like an opportunity for SLSA to create one. :)

These are all great points to discuss with the rest of the SLSA community.

Key management is lacking from in-toto last time I looked. I believe it delegates that problem to the user and doesn't support any keyless or KMS/HSM workflows that would be needed for both the developer and CI pipeline use cases.

Aditya Sirish · Answer 20 · Fri Aug 04 2023 05:34:51 GMT+0800 (China Standard Time)

Key management is lacking from in-toto last time I looked. I believe it delegates that problem to the user and doesn't support any keyless or KMS/HSM workflows that would be needed for both the developer and CI pipeline use cases.

An in-toto layout can do key management for the attestations it expects and the general recommendation is to use TUF to distribute the layout's keys. As for support for keyless and KMS etc, they're coming! in-toto-python's signing back end now has support for a number of setups including sigstore (https://github.com/secure-systems-lab/securesystemslib/tree/main/securesystemslib/signer) and there's a draft PR open (in-toto/in-toto#612) to pull that support into in-toto. Separately, we have some work ongoing in in-toto-python to fully support in-toto attestations (the implementation currently adheres to v1.0 of the spec).

in-toto-golang is a little different: its CLI expects v1.0 behavior but the implementation includes supports for attestations, indeed these are often pulled in by other attestation generators including cosign, moby/buildkit etc. The model's there and other code in the library can be used to sign it, it's just not exposed via the CLI and in things like InTotoRun. We're working to update the library (and therefore the CLI) to more generally support attestations. FWIW, one of the things in my TODO list when I have some bandwidth is some clean up / refactoring of in-toto-golang + adding all of this support.

The primary issue has been developer bandwidth. I do urge folks in the SLSA community interested in this to contribute to fleshing these features out instead of creating new SLSA-only tooling, we'd welcome PRs!

Marcela Melara · Answer 21 · Mon Aug 07 2023 23:43:59 GMT+0800 (China Standard Time)

Per the discussion at the in-toto community meeting last week, the decision is to continue indicating the attestation type at the DSSE layer, with guidelines for indicating the predicate type as well. What those guidelines are is still TBD. Implementations or use cases that don't use signing/in-toto around predicates fall outside of the purview of in-toto, so they can certainly use or define their own predicate-specific mediaType.

Tom Hennen · Answer 22 · Fri Aug 11 2023 23:54:01 GMT+0800 (China Standard Time)

Proposal (from the maintainers meeting with @marcelamelara and @pxp928):

Define bundle type more formally? Include how to handle different envelope types in the bundle (e.g. users can use any signature method they like, but it must fit on a single line, and readers must continue to ignore unrecognized lines)
Media types will be vnd.in-toto.<name of the file minus extension> (within https://github.com/in-toto/attestation/blob/main/spec/predicates) e.g. vnd.in-toto.spdx (allows predicate type to continue to handle versioning)
Define how to indicate the encryption scheme in the media type (e.g. vnd.intoto.foo+dsse, vnd.intoto.foo+sigstore)
Consider if we should try to register these media types (separate issue)
Media types when attestations are stored as blobs should just be considered hints and not relied upon as they are not authenticated (outside the DSSE/other signing envelope)
in-toto bundle media types will not indicate what predicates are stored within. Users will have to download + parse.
Assumption: no need to indicate what the encoding (json, proto, ASN.1) of the payload, in the blob mediatype. Purpose of the hint is to denote what type of data is stored in the blob, not how it's encoded, which is covered elsewhere (e.g. DSSE's payloadType field).
in-toto's rules for DSSE payloadType being set to application/vnd.in-toto+json remain unchanged.
Payload type is unchanged so that it can continue to support old-style in-toto links and limit churn

So

blob storage <---- vnd.in-toto.spdx+dsse    // matches spec filename + indicates signature method
  dsse.payloadType <---- application/vnd.in-toto+json    // unchanged
    statement._type <---- https://in-toto.io/Statement/v1    // unchanged
    statement.payloadType <----- https://spdx.dev/Document/v2.3    // unchanged


blob storage <----- vnd.in-toto.bundle    // User has to download and parse each line separately
  dsse.payloadType <---- application/vnd.in-toto+json    // We only have an opinion here.
    ...
  sigstore...
    ...
  cose...
    ...
  dsse...
    ...
  ...

Advantage of this approach: no changes for existing parsers, only users that care about hints need to do anything.

Tom Hennen · Answer 23 · Wed Aug 16 2023 05:11:17 GMT+0800 (China Standard Time)

@sudo-bmitch WDYT?

Others?

Brandon Mitchell · Answer 24 · Wed Aug 16 2023 05:16:59 GMT+0800 (China Standard Time)

Seems reasonable, as long as there's a way for me to identify SLSA provenance with a media type that's different from other in-toto attestations, the SBOMs, signatures, and any other metadata that may get associated with a container image.