Updating the asymmetric signature recommendations

Question

Updating the asymmetric signature recommendations

tasn opened this issue 6 months ago · comments

I'd like to suggest a few improvements to the asymmetric signature scheme born out of @hf's comments and the ensuing discussion at #25 (comment) as well as feedback from @awolk.

The problem

Symmetric signatures are fast and simple to implement (just a simple HMAC256 of the content), however they pose challenges when it comes to key distribution and keeping it a secret. With symmetric signatures everyone with the key can forge webhook messages, which includes the webhook consumers themselves or any verifier working on their behalf as well. This also means that there needs to be built-in support for secret rotation which is another operational hassle.

The solution to the above problem is utilizing asymmetric signatures. While they are much more resource intensive and implementations are less ubiquitous than their symmetric counterpart (HMAC256), key distribution becomes a non-issue, and it allows for additional benefits such as storing the signing key in an HSM for added security.

The spec already supports ed25519 based asymmetric signing, but the way it's currently done can be improved to further simplify the operational burden, security characteristics, and ease of use. The main limitation with the current recommendation is that it's unclear that for it to be secure it requires a different key per endpoint, even for the asymmetric case. This means that people can easily misuse the recommendations (and have security issues), and anyhow they can't have the benefits of having just one global key pair.

Why does the current implementation doesn't allow for one global key pair?

Because different customers of the same service will be able to trick the service into sending messages to another customer endpoint, and that customer endpoint will trust it because it's coming from the service.

Let's consider the following example: we have a service called ACME, an innocent customer called Alice, and an evil attacker called Eve. Alice uses ACME, with the endpoint url https://alice.com/webhooks/acme and verifies signatures using the ACME key pair. Eve, wanting to attack Alice, signs up to ACME and also creates an endpoint with the url https://alice.com/webhooks/acme. Messages sent from Eve's account will now be sent to Alice's endpoint, and since they are signed with the same key the signature will be valid. Now Eve is able to send messages to Alice's endpoint, and these message pass validation. All it has to do is be able to trick ACME to send messages that she controls which is a common feature in many implementations. Even if it wasn't a common feature, many implementations can be manipulated to send "close enough" information that can interfere with Alice's systems. Either way, this issue significantly increases the attack surface and opens implementations to vulnerabilities.

Suggested changes

Change to P-256

I'd like bring up potentially changing the recommendation to P-256 instead of ed25519. While ed25519 is considered better, P-256 is approved by FIPS-140 and ed25519 isn't which is a requirement for many implementations. We could potentially support both, but I think that's unnecessarily friction and fragmentation.

Additionally, if we are changing to P-256, we may need to consider encoding the signature and keys in DER format to make sure it's widely supported by implementations.

Support (and recommend) having one global asymmetric key pair

In order to support having on global asymmetric key pair, we need to add another (per endpoint) identifier when doing asymmetric signatures. This identifier does not need to be secret or random, it just needs to be unique per consumer (it can be reused across the endpoints of the same consumer if they have more than one). One great candidate for this identifier is using the internal database identifier for the consumer, or if that's sensitive use some derivation of it or just generate an ID for this purpose. It's just important that this identifier won't change during the lifecycle of the endpoint so that the signatures always verify.

The signature scheme we want to follow is the same as what's recommended in the current version of the spec with a few small modifications.

The content to be signed is still: {msg_id}.{timestamp}.{payload}. This in now HMAC256 with the consumer id (discussed above) instead of with the secret (done in the symmetric case), and then signed using the global asymmetric key pair.

So the code would look something like this (psuedo code):

to_sign = "{msg_id}.{timestamp}.{payload}"
hashed_msg = hmac256(key=consumer_id, msg=to_sign)
signature = p256_sign(global_keypair, hashed_msg)

The hashed_msg extra step makes sure that consumers verify this message was indeed intended for them as part of the signature verification process (no way to accidentally forget to verify the ID).

For this to work, the consumer ID needs to be known to the consumer ahead of time. We recommend just updating the public key (whpk_...) to have this consumer ID encoded in it so that it's very easy to use and consumers don't need to copy two identifiers.
The suggested idea is similar to how JWTs are structured: base64 the public key and the consumer id separately and then concatenate them by using a full stop (.) as the delimiter, i.e: whpk_{base64_of_key}.{base64_of_consumer_id}.

The structure of the signatures doesn't need to change, as it's just a normal signature.

Tom Hacohen · Answer 1 · Sun Dec 10 2023 09:05:52 GMT+0800 (China Standard Time)

I'd love to get feedback on this.

Also, CC @hf and @zekth, I know you had some thoughts around this.

Vìncent Le Goff · Answer 2 · Mon Dec 11 2023 16:37:51 GMT+0800 (China Standard Time)

Because different customers of the same service will be able to trick the service into sending messages to another customer endpoint, and that customer endpoint will trust it because it's coming from the service.

One thing that would be also good to point out in the spec might be that we strongly recommend tenant isolation; which means tenant based key/pair. At the same time some webhooks integrations require a DNS verification process via inserting input in a TXT field in your DNS records to prove the ownership of the domain.

I'd like bring up potentially changing the recommendation to P-256 instead of ed25519. While ed25519 is considered better, P-256 is approved by FIPS-140 and ed25519 isn't which is a requirement for many implementations. We could potentially support both, but I think that's unnecessarily friction and fragmentation.

Why do we need to be restrained to one algorithm? Can't we support RSA or ECDSA which are FIPS compliant?

One thing that comes to my mind is we can recommend also the use of JWK ( https://www.rfc-editor.org/rfc/rfc7517 ) that can be exposed throught a well-known endpoint. eg:
https://my-company.com/well-known/{tenant-id}/keys Then the alg defines whats the algorithm of the key ( https://www.rfc-editor.org/rfc/rfc7517#page-8 ).

In the current design that means there's no way to do dynamic loading of the public key for a consumer; so they would have to redeploy / restart their process (depending on how they strore the key) to update it.

Should we add another header to point to a public accessible URL for key? That would mean an http roundtrip to get the key in the webhook process; less than optimal to me.

I know this adds several actors in the picture but i can't think about anything else on the top of my head.

Stojan Dimitrovski · Answer 3 · Sat Dec 16 2023 00:49:09 GMT+0800 (China Standard Time)

@zekth Yes I agree with the JWK approach as I raised it previously too -- IMO we're bound to end up either using it or reinventing it.

In general my feelings are that HMAC-SHA256 and Ed25519 (which under the name EdDSA is NIST compliant but not yet FIPS-140 compliant) should be the first recommendations of the spec.

We need an escape hatch using JWK, again IMO, to support any other key/algorithm for those implementers that want the advanced case.

Stojan Dimitrovski · Answer 4 · Sat Dec 16 2023 00:51:02 GMT+0800 (China Standard Time)

The content to be signed is still: {msg_id}.{timestamp}.{payload}. This in now HMAC256 with the consumer id (discussed above) instead of with the secret (done in the symmetric case), and then signed using the global asymmetric key pair.

We still need to include out-of-band data here too, like headers, to keep it truly secure.

Joel Lee · Answer 5 · Thu Feb 29 2024 13:28:57 GMT+0800 (China Standard Time)

Hey team,

Wanted to bump this thread and see where we've landed with this as it sounds like it's needed for #124

Let me know!

Tom Hacohen · Answer 6 · Thu Feb 29 2024 21:48:47 GMT+0800 (China Standard Time)

Sorry, I've been swamped, I intend to loop back to this in the next week or so.

Vìncent Le Goff · Answer 7 · Fri Mar 01 2024 00:14:53 GMT+0800 (China Standard Time)

@hf

We need an escape hatch using JWK, again IMO, to support any other key/algorithm for those implementers that want the advanced case.

In this scenario do you think both producer and consumer would have the keys managed on their own? Which means our verify method or class would implement a publicKey property and for sign the same but with a private key. Do we want to propagate in the header some informations regarding the publickey or should it be defined by the spec from the producer?