w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification

Home Page:https://w3c.github.io/vc-data-model/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specify what kind of processing is safe on a returned document

msporny opened this issue · comments

@jyasskin wrote in #1380 (comment):

Another possible requirement (that might make sense to discuss in a separate issue) is that the securing specification should/must say what kind of processing is safe on the returned document. In particular, if the securing mechanism secures the JSON representation, like vc-jose-cose and ecdsa-jcs-2019, then it's not safe to subsequently process the document with a generic JSON-LD processor because some of the contexts might have changed since the signature was made. On the other hand, if the securing mechanism secures the RDF, like ecdsa-rdfc-2019, then it's not safe to subsequently process the document as JSON, because it's possible to move properties into unexpected other objects without breaking the signature, and the JSON processor might miss something the issuer needed them to find.

H/T https://medium.com/@markus.sabadello/json-ld-vcs-are-not-just-json-4488d279be43 and see w3c/vc-jose-cose#188.

if the securing mechanism secures the JSON representation, like vc-jose-cose and ecdsa-jcs-2019, then it's not safe to subsequently process the document with a generic JSON-LD processor because some of the contexts might have changed since the signature was made.

This is debatable... and has been heavily debated within the group :). For that statement to be true, the contexts in question MUST NOT be static and MUST NOT have any sort of integrity protection expressed for them in the corresponding VCs... and doing neither of those things is frowned upon in the specification.

There are gray areas here, such as schema.org, which could probably never have fully locked down semantics (because it's a "living vocabulary"), but one could infer the correct version of the semantics via the signature date on the proofs (by cross-referencing the date against the static schema.org versioned releases. So, with a bit of work, one could determine the semantics with a fairly high rate of accuracy with some extra work.

Then there is the default vocabulary in VCDM (which I think is a mistake, but am not objecting over), which says that the semantics are issuer defined and the mechanism for determining those semantics are purely out of band from the specification.

Then there are the people that don't really care all that much about the detailed semantics and are fine w/ semantic drift and inaccuracies (because they're using JSON Schemas or other mechanisms to make up for the drift/inaccuracies).

IOW, there are a fair number of caveats w/ the JOSE/COSE/JCS approach, but given those caveats, some people still prefer the mechanism for various reasons (that are again, debatably good/bad).

I believe what you're saying is: "Ok, fine, then document all of the above." ... which we can attempt to do (it's a good idea)... just noting that some of these disagreements have eaten weeks to months of WG time, so we'll have to do what we can given the time left in our charter (which is not a lot of time).

On the other hand, if the securing mechanism secures the RDF, like ecdsa-rdfc-2019, then it's not safe to subsequently process the document as JSON, because it's possible to move properties into unexpected other objects without breaking the signature, and the JSON processor might miss something the issuer needed them to find.

This one I don't quite understand... what do you mean?

I think you mean "You can express information in various different ways that are semantically equivalent in RDF and, therefore, JSON-LD, and a JSON processor might be 'too hardcoded' to deal with those differences." If so, yes, that's technically true, but in reality, I can't think of a case where that would be true in practice. We can warn people that it's a concern, sure.

All this to say, I don't think the text is going to be as cut and dried as you might hope. The typical processing pipeline for these things is done in JSON, primarily, via JSON Schema (or hardcoded algorithms that check specific types of credentials). A subset of the processing pipeline is done in JSON-LD (securing mechanism verification, but only for the -rdfc- stuff).

All that said, I can propose some text in VCDM and we can go from there. If that text is accepted, we can write some of the details above in the DI spec and the JOSE-COSE spec.

PR #1392 has been raised to address this issue. Once PR #1392 has been merged, this issue will be closed.

The issue was discussed in a meeting on 2023-12-19

  • no resolutions were taken
View the transcript

2.5. Add requirement for securing mechanisms to have post-verification documentation. (pr vc-data-model#1392)

See github pull request vc-data-model#1392.

Brent Zundel: "Add requirement for securing mechanisms to have post-verification documentation".

See github issue vc-data-model#1388.

Manu Sporny: This PR is an attempt to address a concern from Jeffrey Yaskin -- he wanted the securing mechanism specs to be very clear about what is and is not acceptable from a post-processing standpoint.
… This results in the interface we're defining -- the expectation is that there is an interface to ensure that only secured data is returned from the securing mechanism.
… In addition, Jeffrey wanted us to say that when you return that data back, the spec still says more about it. Such as, if you use VC-JOSE-COSE, the spec should say that that no JSON-LD processing was performed.
… A simpler example is ... "when you verify the data and you get back the secured data, you probably shouldn't sit on that data for a year and then use the data in a production setting without reverification".
… Something that says "don't use stale data that was only checked a year ago".
… So Jeffrey wants something like that -- this PR creates a requirement for the securing mechanism specs to be clear about those types of things.

Brent Zundel: Thank you, Manu.
… Any comments on the PR?

Michael Jones: Can you explain a little more by post-processing and why Jeffrey is worried about it?

Manu Sporny: Post-processing is Jeffrey ... as he's defined it ... is any processing after verification.
… You have a secured VC in, you run the verification algorithm, you get back the protected data and then everything after that is "post-processing".

Michael Jones: So using the claims would be post-processing.
… That seems like someone we'd want to have happen.
… I don't know what kind of negative post-processing he's concerned about.

Manu Sporny: He links to two blog posts that talk about it -- and he had general uneasiness around using the claims in ways that they had not been intended to be used.

Michael Jones: I'll look at it.

Manu Sporny: Thanks.

The issue was discussed in a meeting on 2023-12-20

  • no resolutions were taken
View the transcript

4.6. Specify what kind of processing is safe on a returned document (issue vc-data-model#1388)

See github issue vc-data-model#1388.

Brent Zundel: Issue 1388 Specify what kind of processing is safe on a returned document PR 1392 has been raised. PR has requested changes.

Manu Sporny: Mike is requesting changes since text is not actionable. Need to hear from Jeffry.

Michael Jones: Yeah, I kinda agree with Manu's assessment. I'm fine if we develop concrete text.

Manu Sporny: One thing to note is this text is for specification authors. Guidance to specification authors on what is a acceptable cryptosuite.

Sorry for the confusion around my comments. #1392 is plausible but doesn't cover what I was worried about here. I'll try to elaborate on the problem with an RDF signature followed by JSON processing:

Take the coupon demo that https://vcplayground.org/ can generate. It contains a "primaryPurchaseRequirement" field that restricts how the coupon can be used, compared to a coupon without that field. https://gist.github.com/jyasskin/c75c2bd1dc083451f9f9a7596c68ab04 demonstrates how that field can be hidden from pure-JSON processors without changing the canonicalized form of the document. I haven't run this all the way through a verification tool, but I think that's enough to keep the ecdsa-rdfc-2019 cryptosuite passing.

So, I think that if you have a VC that uses ecdsa-rdfc-2019, and your tool for handling the proven VC navigates it using JSON, and your credential's schema uses "restriction" fields like this coupon example and like many driver licenses, then your holders are going to be able to strip off those restrictions.

Have I missed anything?

I haven't figured out a similar attack on the JSON-signature -> RDF processing direction. That direction seems to require more cooperation/malice by the issuer, but I'm also not a security researcher.

The contexts must match what is expected when performing "credential type-specific processing" (and that expectation would be violated in the above case):

https://www.w3.org/TR/vc-data-model-2.0/#credential-type-specific-processing

Do we need to say more than this? This is in the VCDM directly and applies to any securing mechanism.

Good point, but I found another JSON-LD feature, @included, that avoids the need to add a context. Validating against a JSON Schema could block that, but you could do the same thing if the schema allows an array for any parent of the object with "restriction" fields and doesn't ban id fields.

@jyasskin,

Validating against a JSON Schema could block that, but you could do the same thing if the schema allows an array for any parent of the object...

I'd say that this again falls under ensuring expectations can be met when performing "credential type-specific processing"; the schema must enforce the conditions necessary for appropriate consumption, not the securing mechanism.

As a side note, it may also be the case that the example you're describing is related to the idea of a "critical field" -- and we had extensive discussions about that concept in this issue and on calls in the past:

#158

We ultimately landed on the idea that each claim must be acceptable on its own, allowing any claims to be ignored without "modifying" the meaning (or "assumed truth") of other claims.

Yes, credential designers can be careful to make sure they validate the JSON schema of data strictly enough to plug any malleability that JSON-LD offers, and they can be careful to avoid critical or "restriction" fields that modify the meaning of other claims. Asking people to be careful still seems like a recipe for a never-ending stream of vulnerabilities. Is there any aspect of the standard's design that could catch mistakes before they turn into vulnerabilities?

@jyasskin,

Yes, credential designers can be careful to make sure they validate the JSON schema of data strictly enough to plug any malleability that JSON-LD offers, and they can be careful to avoid critical or "restriction" fields that modify the meaning of other claims. Asking people to be careful still seems like a recipe for a never-ending stream of vulnerabilities. Is there any aspect of the standard's design that could catch mistakes before they turn into vulnerabilities?

Telling people to use strict schemas is an aspect of the design -- and I think it solves this particular problem to the extent that it is solvable. As for the question generally, the specification has many MUSTs throughout which exist, in part, to help catch mistakes.

Can we catch all mistakes? I don't think so -- not without harming open world expression in the three party model. It is important for verifiers to be able to ignore any claim (no claims that invalidate others). And this aspect of the design should absolutely inform VC designers. Also, importantly, in the three party model, issuers cannot presume how (for what purpose) their VCs will be used by verifiers.

Credential type-specific verifiers are expected to be less flexible (by definition), applying strict schemas. More flexible verifiers are also possible -- and they can choose to use the JSON-LD API to perform transformations on the data to cause it to be expressed how they want to consume it.

In any case, I think the particular problem you're highlighting is the result of enabling open world expression generally -- and the only technical enforcement mechanism we have (at this layer) is with strict schemas. Assuming a simple, flat, key-value pair data model (or also in an "anything goes JSON model"), an issuer could just as easily start adding a new key-value pair to data they've been issuing that is intended to modify the meaning of the other key-value pairs. Only if strict verifiers are already rejecting unrecognized key-value pairs with a schema will they reject this new data when presented by holders.

Behaviors of this kind and decentralized "upgrading" of the ecosystem over time is expected in the three-party model. But enabling flexibility does rely on participating parties taking on more responsibility than they otherwise would if we were to constrain behavior further (again, at this layer). VC designs that avoid certain approaches will be less disruptive to verifiers -- and the market will react either way. Another way of putting this is that the VCDM spec is not at the right layer to solve a subset of problems; it needs to be flexible enough to allow other layers to exist and innovate.

So, VC designers should not create VCs with claims that mean: "if present, the rest of the claims are invalid". It's just not going to work. If you make a claim in a VC, you're making a claim about something that stands on its own. Failing to follow this rule gets even more dicey with selective disclosure. We could perhaps "say more" about this in prose, but it's not technically enforceable, as you say. The only enforcing mechanism we have is through schemas -- the same tool that is used to solve this problem elsewhere. So, IMO, it should be considered sufficient here for simple, strict, verifiers.

If a VC designer really wants to do something like the "model a coupon as an authorization" example you suggested, they should enumerate what powers are granted (e.g., via an allow list) so the verifier can match against actions attempted -- rather than defaulting to "anything is allowed unless a block list is present". This avoids mistakes, but at another layer: the VC designer handles it for all issuers that want to use that VC type. This is an aspect of the VCDM design as well: good VC designs are expected to be shared and reused across different issuers. Analysis that presumes that this extra layer is absent is problematic. VC designers and issuers do not have to be the same party and less experienced issuers can defer to VC designers to help them prevent mistakes.

Finally, I will say, I think it's generally a bad idea to model authorizations as claims. There's considerable literature that's been written on that subject.

Given that ... I wonder if the right improvement to fix this issue is to elaborate on https://w3c.github.io/vc-data-model/#credential-type-specific-processing, to be more prescriptive about what a credential designer needs to do in order to make it safe to process their credentials as JSON. For the known vulnerabilities, it seems like they have 3 alternatives:

  1. Require enveloping proofs;
  2. Require an exact list of contexts and validation against a specified schema (which could be a JSON Schema) that rejects arrays or id and @id properties, and unknown properties [I'm not confident this is sufficient to block all the ways someone could move properties around in JSON-LD, but the JSON-LD experts here will know]; or
  3. Never define a claim that means other claims in the credential are less valid. This could probably use an example of how to model driver license restrictions like "Daylight driving only" or "Corrective lenses".

I'm still worried about Schneier's Law, or perhaps a corollary to it: if I could find these risks without really knowing JSON-LD or the credential space, what other problems could an expert find? But I also don't want to insist on particular changes without being able to point to a concrete attack that they'd fix.

I'm still worried about Schneier's Law, or perhaps a corollary to it...

I think the issue being discussed here can be boiled down to three statements:

  1. All claims (in a VC) must be understood by verifiers.
  2. There is more than one way to express the same claims in JSON-LD.
  3. Claims may be ignored by verifiers when they are not understood.

If the first and second statements are true, then the second statement requires that all verifiers must understand every possible expression of claims -- which I believe is driving the concern here. However, if the third statement is true, the first cannot be true. And the third statement is true.

The second statement is also true, but can be modulated. So, to the extent that we're concerned about vulnerabilities arising from the combination of the second and third statements, it's a place where we can work on solutions.

TL; DR: The spec should say that issuers (or VC designers) that want their data to be consumable by verifiers that will only look at a single expression of their VCs -- need to publish semantically-immutable contexts and a schema that can indicate whether a VC fits that expression. I think we do this today already but perhaps we need to say it more strongly (a MUST) despite it not being "testable". This is essentially your "option 2", I believe. I think your "option 3" could also be done with additional language in the spec and that an example would be helpful.

However, in the case of a driver's license, I would expect "option 2" to be more commonly used because of attempts to do "skeuomorphic" modeling, at least in the near term. But, an example could show how "driving restrictions" (and other things of this sort) can be modeled as an atomic JSON-blob claim that must always be present, such that "none" is a valid value. In other words, its absence would trigger a simple verifier to reject the VC altogether, instead of the verifier presuming there were no restrictions. So it may be better advice (than "option 3") to say: "always explicitly express a 'no restrictions' claim instead of implying it by the absence of a 'restrictions' claim." It is better for verifiers to be able to act on what is said as opposed to what they may not have heard.


I think the underlying problem here is that VCs "look like" existing technologies that do not use the three party model, even though they have very important differences. So, there's a tendency to try and treat them that way, to think they are "just another way to do the same thing as before". But VCs are statements of fact by the issuer, who does not know who the verifiers will be or how they will use the information. The atomic unit is "a claim", which stands on its own.

VCs are not holder-opaque bundles of information put into an envelope with the known verifier's address on it -- to be passed through an intermediary, allowing some action authorized at/by the issuer. Holders may read the information in a VC perhaps without even knowing how to check the authenticity via its securing mechanism, just like physical world credentials.

VCs are not authorizations or access tokens -- in the way that some preceding technologies are -- though they may be flexible enough for someone to try to use them that way. I think this is a bad idea, in part, because of the flexibility of VCs. Authorizations should have a fairly limited vocabulary and only be able to be expressed one way, whereas "claims" should maximize flexibility: "saying anything about anything". Of course, this allows bad ideas too, but it's not the core VCDM's job to solve this. In my view, VCs can be exchanged for authorizations according to some business rules, but they should not be authorizations themselves.

Most importantly, VCs should not have all of their flexibility removed or stigmatized because some might try to model an authorization as a VC. Statement two is true and must remain true. I would consider it a failure if the outcome of this work is that we just created another (more complicated) way to express identity-based authorizations because we felt like we had to optimize toward something people are already familiar with in the digital realm. Authorizations are fundamentally a two-party affair (with affordances for delegation and consent); VCs involve three independent parties, each with their own goals.

I do think it's a good idea to offer mechanisms to restrict expression for safe, simpler consumption, but we should be wary of optimizing for what amounts to non-use cases, even if those are possible and could result in vulnerabilities. I understand that users of the technology do not have the benefit of having read and thought about all of these things -- and so it's a foregone conclusion that they will use VCs where another technology would have been better suited. I just want to be careful that we don't push VCs off the path they were designed for in the process.

So, looking at ways to prevent vulnerabilities that may arise from the fact that statement two and three are both true:

  1. Require enveloping proofs...

Restricting the choice of securing mechanism does not close the loop here. It's true that securing mechanisms that focus on securing the exact expression from the issuer would prevent a holder from choosing a different expression of the same claims to send to the verifier. (Note: The line here isn't even necessarily on "enveloping proofs" vs. "embedded proofs" -- new securing mechanisms of either sort could allow or disallow a reconfiguration of the claims).

However, this only partially solves the problem.

An issuer can still "make a mistake" by issuing two (or more) VCs, each with different expressions of the same claims but at least one of them expresses "invalidating claims" in a place that a verifier does not understand. This results in the same outcome: a verifier accepting a VC that they "wouldn't have". An additional constraint would still be required of issuers: "Don't ever express the same claims in more than one way". This seems onerous -- and is not technically enforceable. I think it's better to say "don't express claims that invalidate others" ("option 3") or perhaps "always explicitly express a 'no restrictions' claim instead of implying it by the absence of a 'restrictions' claim."

  1. Require an exact list of contexts and validation against a specified schema...

Requiring an exact set of contexts and schemas, I believe, should solve this problem. This includes disallowing additional properties and restricting values to simple objects (not arrays, or making it clear to verifiers that the same subject can appear multiple times). This schema eliminates all other possible expressions of the data, thereby communicating to verifiers what the possible expressions are -- each of which must be checked by them.

  1. Never define a claim that means other claims in the credential are less valid...

Yes -- or "always explicitly express 'no restriction' claims instead of implying no restrictions through the absence of a claim". This ensures that verifiers that care about such a claim will reject a VC where they cannot find it.

That all looks reasonable; thanks @dlongley. 3 minor points:

  1. On "Claims may be ignored by verifiers when they are not understood.", if a claim in a credential is specified at the same time and in the same document as the credential's overall type, issuers have a good argument that all verifiers who understand any of the credential will also understand that claim. The subtlety here, which goes beyond the expected difficulties with versioning and extensibility, is that claims can also be ignored when they're not found.
  2. We have a recent precedent for non-testably "MUST"ing specifications in https://w3c.github.io/vc-data-model/#securing-mechanism-specifications. I think it's reasonable to write similar requirements for specifications for credential types.
  3. I don't fully understand the distinction between "authorizations" and "claims". This issue isn't the right place to educate me, but if that's an important thing for credential designers to understand, the spec should probably point them in the right direction.

The issue was discussed in a meeting on 2024-01-03

  • no resolutions were taken
View the transcript

2.4. Specify what kind of processing is safe on a returned document (issue vc-data-model#1388)

See github issue vc-data-model#1388.

Brent Zundel: finally Specify what kind of processing is safe on a returned document #1388 - not exactly sure the state of things here.

Manu Sporny: does not seem like this will make it. Jefferey asked us to make a change. selfissued is concerned about language being too vague. if we can't find better language let's close the PR and issue.

Brent Zundel: yes, have marked this PR as pending closed today.
… have gone through all before CR PRs and issues and got status updates. good meeting & conversation. please make progress on your tasks. feel free to reach out to editors & chair if need-be.
… during next weeks calls expect to take core data model to CR.


@jyasskin wrote:

I think it's reasonable to write similar requirements for specifications for credential types.

Yes, reasonable given that the WG has the time to create that section, debate it, and come to consensus on the contents. I have a vague idea of what could be written in that section based on the exchange that you and @dlongley had above. I've raised issue #1410 to track this concern.

It is worth noting that the WG is now about 4 months behind schedule in transitioning VCDM v2.0 to the Candidate Recommendation phase. Our charter expires in about 5 months.

While I can appreciate the desire to articulate the conversation above into specification text, we are hitting the practical realities of what this WG can accomplish in the time it has left (there are always more issues we could have spent more time on and written better specification text for). We still have three more specifications (after this one) to get to CR in short order.

We are attempting to make the call to transition VCDM v2.0 to CR this coming week.

I expect the group will have an "is the juice worth the squeeze" discussion regarding #1410.

@jyasskin it would help us plan this week's transition call if you could be clear if 1) not having this text would cause your organization to formally object to a REC, or 2) you would be ok with us going to CR and then making an attempt at the text during CR (and then possibly going through another CR if we do come to consensus on some normative statements), or 3) you would prefer another path forward.

PR #1392 has been merged. PR #1410 has been raised to address ongoing concerns from @jyasskin. Closing this issue.

  1. We don't plan to formally object based on the absence of a standardized mitigation for this sort of vulnerability.
  2. It's fine with me to take the VC specs to CR multiple times.

@RByers mentioned that https://github.com/WICG/identity-credential is likely to show a less-scary consent dialog for certain "safer" credential types, and this sort of thing might go into that decision, but I don't think we'd need the guidance to be in the specification to make that work.

I would prefer if you'd keep this issue open until it's solved. While #1410 discusses one possible solution, this issue's discussion does a better job of laying out the overall problem, and there are several possible solutions to the problem. This issue does not need to keep the before-CR label.

I would prefer if you'd keep this issue open until it's solved.

Ok, re-opening issue and re-labeling as "We will continue to work on this concern during CR and might go through another CR if we get to something that works for everyone."

  1. Never define a claim that means other claims in the credential are less valid...

This is problematical because the VCDM already does this. The credentialStatus claim and termsOfUse claim already do this, or have the potential of doing this.

The issue was discussed in a meeting on 2024-02-28

  • no resolutions were taken
View the transcript

3.4. Specify what kind of processing is safe on a returned document (issue vc-data-model#1388)

See github issue vc-data-model#1388.

Brent Zundel: Specify what kind of processing is safe on a returned document.
… Do you need feedback?

Manu Sporny: I do.
… On Jan 3rd, Jeffrey gave us some options to make him happy.

Manu Sporny: Jeffrey noted these things as options to mitigate his concerns: #1388 (comment).

Manu Sporny: I think if we do the first thing -- that he listed. His three options as alternatives, the first option is to basically say: "If a verifier doesn't understand a claim, they can ignore it".
… The second one is, "Create a new section in the spec that provides instructions for people creating vocabularies and credential types where they say what processing is safe/not safe.".
… I think that's a significant amount of work to figure out what to say there.
… The third one is, really a question about an authorization and a claim ... we speak to that in the implementation guide already, I think.
… I think we should do the first thing and say that a verifier can ignore ... I don't think we can make a normative statement because it's business rules, but "it's expected that a verifier will ignore claims that it does not understand".
… It's also expected that a credential type would say what's mandatory and what's optional.
… I'm looking for anyone objecting to those kinds of statements at this point.

Joe Andrieu: My first question ... of those two statements -- are they in IRC or in the issue?

Manu Sporny: They are just in my head unfortunately.

Joe Andrieu: I had a hard time following -- I don't understand the framing of "safety" and it concerns me. Because processing JSON isn't unsafe.
… I think it's a weird semantic.

Manu Sporny: +1 to avoid that word. Jeffrey's concern was that it may be possible to construct something in JSON-LD that is then read by a credential-type-specific process that makes it misconstrue some of the statements.
… It presumes there is zero JSON schema checking and so on -- that all the production scale deployments I know of are checking their inputs for a certain structure.
… He's saying that if you don't tell people about that, then it's not safe. But that falls into the category of "If you don't check your program inputs, that's bad", just like SQL injection attacks, etc.
… +1 to not talk about safe/unsafe but talk about expectations when inputs are provided.

Joe Andrieu: That all sounds great I'll look for your language.
… My frustration is that I don't think I've ever seen this type of thing from any other Web standard. I don't recall being told what I can do with an HTTP response. Any data I get could be an attack, I think this is just weird framing.

PR #1451 has been raised to address this issue. This issue will be closed once PR #1451 has been merged.