`compare` sometimes returns 0 for different digests

Question

`compare` sometimes returns 0 for different digests

emillon opened this issue 6 years ago · comments

Hi,

compare relies on caml_hash, but this function has collisions that are easy to find (in particular because of its small codomain).

This behaviour is demonstrated by the following program:

let s1 = "O-)"

let s2 = "$@$"

let () =
  let open Digestif.SHA256 in
  let d1 = digest_string s1 in
  let d2 = digest_string s2 in
  Printf.printf
    "compare:        %d\nunsafe_compare: %d\neq:             %B\n"
    (compare d1 d2)
    (unsafe_compare d1 d2)
    (eq d1 d2)

Output:

compare:        0
unsafe_compare: -1
eq:             false

Thanks!

Etienne Millon · Answer 1 · Thu Aug 23 2018 19:47:18 GMT+0800 (China Standard Time)

(see #34; cc @cfcs)

Calascibetta Romain · Answer 2 · Thu Aug 23 2018 22:22:31 GMT+0800 (China Standard Time)

As we said with @emillon, and because compare does need to do a lexicographical comparison, a solution should be to use implementation of eq and, instead to return true or false, we return the non-sense integer of the computation:

let compare a b =
  let ret = ref 0 in
  for i = 0 to D.digest_size - 1
  do ret := !ret lor ((Char.code (String.get a i)) lxor (Char.code (String.get b i))) done;
  !ret

We continue to be protected against timing attack (see #50 about how I test it) and return 0 only when eq a b returns true. Any opinion @cfcs?

Etienne Millon · Answer 3 · Thu Aug 23 2018 22:28:36 GMT+0800 (China Standard Time)

I don't think that this defines a proper order, unfortunately.

We can use something a cryptographic hash instead of caml_hash, though.

Calascibetta Romain · Answer 4 · Thu Aug 23 2018 22:50:14 GMT+0800 (China Standard Time)

After a talk, the best option seems to be to completely remove compare function. Indeed,

@emillon shows an example where compare return a wrong value.
We need to take care about timing attack
We need to be agnostic from value of inputs (returned value - 1, 0 or -1 - must not be used to infer input values (as @cfcs, an order which depends on an internal random stuff of digestif is a solution)
Lexicographic order stills valid in some context (like ocaml-git)

For all of theses points (if I did not forget something), the current solution is to provide an unsafe_compare - and client will be informed about security issues with the documentation - and a compare which returns a subtraction of murmur3.

To avoid any surprise from the client about compare and let him to have a lexicographic order, I think the best solution is just to remove compare and still have unsafe_compare.

C For C's Sake · Answer 5 · Fri Aug 24 2018 06:31:46 GMT+0800 (China Standard Time)

As previously discussed, I think exposing a comparison operator that depends on the values of the hashes is a bad idea since it leads to very subtle cryptographic attacks in some settings.
Returning 0 only when they are eq breaks the ordering assumption and thus will probably lead to problems with algorithms that use compare (for sorting/maps/trees etc). IIRC @samoht had some objection against this?
Returning something that depends on the hash of the values instead of the values does IMHO not solve the problem, even if it makes it slightly harder for the attacker to meddle with.
A compile-time warning would be a step in the right direction.

Etienne Millon · Answer 6 · Fri Aug 24 2018 16:42:36 GMT+0800 (China Standard Time)

I think that the question boils down to, what kind of comparison operators do we need?

a constant time equality function, to check MACs in particular.
a String.compare-like equality function, used in contexts where performance is important and timing is not.
a total order, possibly lexicographic.

I'm not aware of a use case where a constant time total order is required. Common cryptographic libraries do not supply one, either: openssl's CRYPTO_memcmp or Java's MessageDigest.isEquals only return a boolean.

If we can work under the hypothesis that this is not required, we can just expose the three above functions as equal, unsafe_equal, and compare (with a comment in the interface).

(In addition, maybe we can drop unsafe_equal since third party users can define it with coercion)

Hannes Mehnert · Answer 7 · Fri Aug 24 2018 16:48:09 GMT+0800 (China Standard Time)

Hi, to ship in (after a week of not being in front of a computer):

constant time equal function is indeed required for various use-cases (and should be the default)
could someone remind me in which concrete use case a high-performance equality function is needed?
could someone as well remind me why a compare is needed? (IMHO if there's a constant time equality, there should as well be a constant time compare)

Etienne Millon · Answer 8 · Fri Aug 24 2018 17:01:26 GMT+0800 (China Standard Time)

It's a good point: compare and unsafe_equal can be defined by callers if need be. The only important function, and the most likely to be misimplemented, is constant time equality.

Hezekiah M. Carty · Answer 9 · Fri Aug 24 2018 22:48:11 GMT+0800 (China Standard Time)

I certainly have cases where a cryptographic hash is used as a pseudo-unique identifier. Being able to create a set or map keyed on certain hashes is very useful.

That said, it's fine if compare is spelled unsafe_compare to clearly indicate that there are concerns around such a lexicographic ordering function.

Etienne Millon · Answer 10 · Mon Aug 27 2018 16:59:27 GMT+0800 (China Standard Time)

OK, so to sum up I propose to remove compare and update the documentation on unsafe_compare. I'll prepare a PR.