`compare` sometimes returns 0 for different digests
emillon opened this issue · comments
Hi,
compare
relies on caml_hash
, but this function has collisions that are easy to find (in particular because of its small codomain).
This behaviour is demonstrated by the following program:
let s1 = "O-)"
let s2 = "$@$"
let () =
let open Digestif.SHA256 in
let d1 = digest_string s1 in
let d2 = digest_string s2 in
Printf.printf
"compare: %d\nunsafe_compare: %d\neq: %B\n"
(compare d1 d2)
(unsafe_compare d1 d2)
(eq d1 d2)
Output:
compare: 0
unsafe_compare: -1
eq: false
Thanks!
As we said with @emillon, and because compare
does need to do a lexicographical comparison, a solution should be to use implementation of eq
and, instead to return true
or false
, we return the non-sense integer of the computation:
let compare a b =
let ret = ref 0 in
for i = 0 to D.digest_size - 1
do ret := !ret lor ((Char.code (String.get a i)) lxor (Char.code (String.get b i))) done;
!ret
We continue to be protected against timing attack (see #50 about how I test it) and return 0
only when eq a b
returns true
. Any opinion @cfcs?
I don't think that this defines a proper order, unfortunately.
We can use something a cryptographic hash instead of caml_hash
, though.
After a talk, the best option seems to be to completely remove compare
function. Indeed,
- @emillon shows an example where
compare
return a wrong value. - We need to take care about timing attack
- We need to be agnostic from value of inputs (returned value -
1
,0
or-1
- must not be used to infer input values (as @cfcs, an order which depends on an internal random stuff ofdigestif
is a solution) - Lexicographic order stills valid in some context (like
ocaml-git
)
For all of theses points (if I did not forget something), the current solution is to provide an unsafe_compare
- and client will be informed about security issues with the documentation - and a compare
which returns a subtraction of murmur3
.
To avoid any surprise from the client about compare
and let him to have a lexicographic order, I think the best solution is just to remove compare
and still have unsafe_compare
.
- As previously discussed, I think exposing a comparison operator that depends on the values of the hashes is a bad idea since it leads to very subtle cryptographic attacks in some settings.
- Returning
0
only when they areeq
breaks the ordering assumption and thus will probably lead to problems with algorithms that usecompare
(for sorting/maps/trees etc). IIRC @samoht had some objection against this? - Returning something that depends on the hash of the values instead of the values does IMHO not solve the problem, even if it makes it slightly harder for the attacker to meddle with.
- A compile-time warning would be a step in the right direction.
I think that the question boils down to, what kind of comparison operators do we need?
- a constant time equality function, to check MACs in particular.
- a String.compare-like equality function, used in contexts where performance is important and timing is not.
- a total order, possibly lexicographic.
I'm not aware of a use case where a constant time total order is required. Common cryptographic libraries do not supply one, either: openssl's CRYPTO_memcmp
or Java's MessageDigest.isEquals
only return a boolean.
If we can work under the hypothesis that this is not required, we can just expose the three above functions as equal
, unsafe_equal
, and compare
(with a comment in the interface).
(In addition, maybe we can drop unsafe_equal
since third party users can define it with coercion)
Hi, to ship in (after a week of not being in front of a computer):
- constant time
equal
function is indeed required for various use-cases (and should be the default) - could someone remind me in which concrete use case a high-performance
equality
function is needed? - could someone as well remind me why a
compare
is needed? (IMHO if there's a constant timeequality
, there should as well be a constant timecompare
)
It's a good point: compare
and unsafe_equal
can be defined by callers if need be. The only important function, and the most likely to be misimplemented, is constant time equality.
I certainly have cases where a cryptographic hash is used as a pseudo-unique identifier. Being able to create a set or map keyed on certain hashes is very useful.
That said, it's fine if compare
is spelled unsafe_compare
to clearly indicate that there are concerns around such a lexicographic ordering function.
OK, so to sum up I propose to remove compare
and update the documentation on unsafe_compare
. I'll prepare a PR.