Should a cache be used and if so how long should it be set by default?
kdenhartog opened this issue · comments
This seems like a security considerations aspect but it's become apparent to us that in scenarios where the did:web is being used by an issuer for credentials that are verifiable in offline scenarios a caching strategy is necessary. While this is a more general pattern that needs to be considered such as what's mentioned in the did-resolution draft spec, do we want to make mention of it within this spec as a security considerations aspect? Something to consider here is that by relying on the cache we can help to solve some of the privacy issues from phoning home to get the most recent did document as well.
I am of the opinion that we should be fothcoming about the limits to any "privacy" we offer. Any time a request is made, the details of that request may be recorded. Logged data will generally include the time of the request and the requesting IP address. This is inherent in HTTP, and in TCP/IP generally. Domain name resolution requests in advance of the HTTP request typically leak similar information.
Caching can relocate the privacy issue to a resolver (or, for that matter, a CDN or even a proxy provider in the context of DID:WEB). There is, I think, an open question as to whether relocating the privacy concerns to a centralized resolver improves or degrades the privacy scenario.
We can reference these issues in the context of DID:WEB, but resolution of them must be undertaken more globally to have any practical effect. Even if we were able to make HTTP requests for DID:WEB documents private and secure, those requests are likely to be a very small portion of the communication in any given transaction that would need a similar level of privacy and security. In a hypothetical context of information displayed on a web page, every image, every external script loaded, and the markup of the page itself each introduce a similar set of privacy concerns.
Could RFC 7234 HTTP/1.1 Caching be useful here?
did:web
hosts could set appropriate HTTP cache headers, and resolvers could follow these.
@clehner -- +1, I think one of the nice things about using HTTP for this method is that we get things like caching for free (meaning, just use the HTTP mechanism from RFC 7234).
I'm not sure if we need to specifically call that out in the spec, but we could.
I don't think we need to call it out either, but it would not hurt to point out that the typical HTTP mechanisms are applicable. I think language to that effect can be quite general for an audience of people implementing the spec.
What about in the security considerations sections calling out that if a cache is used (agreed with reusing RFC 7234 as well, that's the direction we went anyways) to calling out that the longer the cache time the more likely for a relying party to not know about an update occurring which could present issues around validity of documents (such as VCs) that have been issued with keys from this document could be incorrect while the cache is still valid.
The security and tracking issues associated with DID:WEB are essentially the security and tracking issues associated with any modern website. DNS lookups may be more or less secure depending on which service resolves the domain name and whether that resolution runs over https. Reverse proxy services (aka CDN) like Cloudflare, AWS Cloudfront, and Fastly cache content close to users for an increasing portion of active websites, and those caches could become stale. HTTP server logs generally record paths requested, IP addresses, and user agent strings with timestamps, and this very probably occurs at every point after the initial SSL termination. Similarly, DNS logs may include IP address, name resolved, and a timestamp. Browsers themselves store request history. All of this logged data can be correlated with advanced identity tracking.
We could look at the security section as an opportunity to lay out the basics to give people a better overall understanding of how these systems work, where the vulnerabilities are typically found, and generally how to mitigate them.
I think, though, that it is important to stress that all of this is inherent in the way the internet operates today, and is generally applicable to the use of browsers and https.