ssbc / private-group-spec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DM Shared Key: What if an attacker reposts some else's message containing a DM pubkey?

keks opened this issue · comments

(In this issue I will use the term box2 secret or box2 key for the key we pass into the box2 encrypt/decrypt functions)

In the current spec, the key used as box2 key is derived from the shared DH secret between sender and recipient, and a static string. That means, if the attacker reposts a dm-key message from a different user Alice, recipients will end up with the same box2 DM secret.

When we send an encrypted message, we derive the slot key from the shared secret, our own feed id and our previous message ID. All these values will be the same for Alice and the attacker! Therefore, we have an attack.

There are two questions to ask here:

  • How can we fix it?
  • What is the conceptual issue here?

The first question can be easily answered. In the derivation of the box2 secret, after extracting a good key, we also expand it with info that include the author's feed id in tfk encoding. Note that HKDF-Extract-then-HKDF-Expand is just HKDF.

function computeDirectMessageKey (author_tfk, my_secret, your_public) {
  var salt = 'ssb-dm-shared-key-extract-salt'
  var input_keying_material = scalarmult(my_secret, your_public)
  var info = 'ssb-dm-shared-key-expand' + author_tfk

  return hkdf(hash = 'SHA256', salt, input_keying_material, info)
}

In this pseudocode, + means concatenation.

Now, if an attacker copies some else's dm-key message, Alice (the original author) still has all the information to derive the same key, since the computation is based on the same secret values. However, since we only care about honest Alices, they will only perform the computation with the values we define here and won't accidentally end up with the same key.

One thing of note here is that the key used to respond will be different: If bob responds to Alice's message, they will use a different author_tfk value in the derivation. This means we have twice as many keys. I don't think this is prohibitive, and if it is we can employ a more complex derivation where sender and receiver feed ids are both included, but sorted. This would mean that the derivation parameters are specific to two users and match up.

Another open question is whether we want to include more data in this derivation. For example, if a user chooses a public key, then cycles keys, and then cycles back to the old key - should the box2 key be the same as in the beginning? Currently it is, but we may want to prohibit that (pros? cons?). If so, we should also include the message ID of the dm-key message in which the key was published.

Okay, so what is the wider, conceptual issue here?

We are running into the same issue as the CA-based TLS world: we don't check that a user has the secret key to the public key they claim. And in fact, we can't. In the CA system, we trust the CAs and they can perform a check that someone who applies for a certificate knows the secret key. This is called Proof of Possession. These proofs usually are a challenge-response protocol, where the CA sends some random data (challenge) and the response of the applicant can be checked against their claimed public key. For a signature key, this is as simple as signing the challenge (See section 4.3 of https://tools.ietf.org/html/rfc4210#section-4.3).

Now, in our case we don't have the CA as a trusted third party. We can try to do what they do, but trustless.
However, that requires a good public source of randomness, which we don't really have. We could use the previous hash, but since all previous messages are authored by the attacker, this is not really safe. Other blockchains have this issue as well and move away from the previous-block-hash-as-source-of-randomness concept (even though their previous hash has much more non-adversarial influence!). Also, the required techniques have been around for a while but are not nearly as wide-spread as e.g. libsodium. So implementations in popular languages my not be around.

I will try find time to do some structured analysis of the dm-key derivation protocol in a model where the adversary can use arbitrary public keys (in contrast to public keys they have the secret key to). This way we wouldn't need to do a proof of possession, at the cost of a more complex analysis. However, our protocol is pretty simple so I think it is the better way forward.

Sorry for the wall of text.

A. I'm still trying to understand this

So if Alice has public/ secret keys (A_pk/ A_sk) and Bob has these too, then our initial proposal said:

dm_key = scalarMult(A_sk, B_pk) = scalarMult(B_sk, A_pk)

we then take dm_key and use it to fill a "slot" :

feed_id, prev_msg_id, "slot_key", dm_key ------DeriveSecret----> derived_dm_key

derived_dm_key, msg_key -----------------------XOR-------------> slot_key

So any recipient who was able to access msg_key can take the other slot_keys and XOR back to possible derived_dm_key. But I can't see how they could reverse DeriveSecret to access dm_key.

So I think I lost the thread. Also I don't know how you could do a replay attack with some part of this given DeriveSecret closes over prev_msg_id...


B. using encode

previously I think we used encode for the info combination.
Should we do that again here ?

i.e.

function computeDirectMessageKey (author_tfk, my_secret, your_public) {
  var salt = 'ssb-dm-shared-key-extract-salt'
  var input_keying_material = scalarmult(my_secret, your_public)
  var info = encode(['ssb-dm-shared-key-expand',  author_tfk])

  return hkdf(hash = 'SHA256', salt, input_keying_material, info)
}

// encode = slp-encode

Hmm, so let me make sure i understand.

Alice shares her dm key K_dm which Bob uses to communicate with her successfully. But then Carol comes along and claims to have the SAME key. C_dm = K_dm

Is that what you mean? wouldn't the effect of this be that if Bob tries to message Alice, Alice can still read it (as expected) and Carol cant (as expected, because she doesn't actually have the secret key part). But if Bob tries to message Carol - Alice will be able to read it!

This is a weird quirk, but Carol isn't attacking Alice, she is attacking herself. I guess she is confusing Bob.

Is that right?

@mixmix

Re: A

You are right in that this does not lead to an attack if we use box2 for this. However, It would be nice if it was obvious that this attack does not work without understanding the specifics of box2.

Re: B

Hm, yes we could use that. It feels weird to reuse an internal encoding format of box2 here, maybe we should pull it out? But this feels like we are going full ipfs/multiformats (which is not what I want to do).

@dominictarr It would be an attack on the communication between Alice and Bob, because they would be confused and, based on that confusion, might perform actions that are against their interests (but in favour of Carol's). A bit like the message reposting attack that is currently possible.

I read this but don't know how to move this forward because don't fully understand the threat.

This is resolved, but @keks was going to leave a comment about why and close

@keks I'm going to go ahead and close this.

Feel free to leave a comment about our thinking if you want to