Language independent hashing mechanism for float and integers (Proposal)

Question

Language independent hashing mechanism for float and integers (Proposal)

weigandf opened this issue 5 years ago · comments

Depending on what programming language is used to implement ObjectHash there is quite a difference in behavior and the resulting hash. One big issue I see is the distinguishment between a float and an integer in the case of integer-valued floats.

An example (taken from the test cases) is:

(1) ["foo", {"bar":["baz", null, 1, 1.5, 0.0001, 1000, 2, -23.1234, 2]}]
-and-
(2) ["foo", {"bar":["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]

In Python the results are:
(1) 726e7ae9e3fadf8a2228bf33e505a63df8db1638fa4f21429673d387dbd1c52a
-and-
(2) 783a423b094307bcb28d005bc2f026ff44204442ef3513585e7e73b66e3c2213

The Go implementation introduced a CommonJSON object using the Go marshalling function to address this issue:

json.Marshal(o)

I would like to suggest a different solution which is language independent by following the JSON Schema proposal in:

http://json-schema.org/draft-04/json-schema-core.html#rfc.section.5.5:

It is acknowledged by this specification that some programming languages, and their associated parsers, use different internal representations for floating point numbers and integers, while others do not.

As a consequence, for interoperability reasons, JSON values used in the context of JSON Schema, whether that JSON be a JSON Schema or an instance, SHOULD ensure that mathematical integers be represented as integers as defined by this specification.

In my opinion this can be simply achieved by adding a case differentiation:

case Type.Float:
{
  if ((float)val % 1.0 == 0.0)
  { 
    HashInt((int)val);
  } else
  {
    HashFloat((float)val);
  }
  break;
}

It can be discussed if it is useful to exclude zero from that case distinction by adding:
(float)val % 1.0 == 0.0 && (float)val != 0.0

In my opinion it would be real great for the ObjectHash project to have a common understanding about this issue and for all implementations to follow the recommendation.

Queer Supervillainess · Answer 1 · Sun Feb 24 2019 18:41:14 GMT+0800 (China Standard Time)

@weigandf That's already addressed in the README

Queer Supervillainess · Answer 2 · Sun Feb 24 2019 18:47:44 GMT+0800 (China Standard Time)

Regarding your actual proposal, there are 3 major issues:

it's backwards incompatible, i.e. the hash of some objects will change, it requires changing existing implementations;
it's not implementable in constant time, even when the schema and layout of the object are known;
x mod 1 == 0 is a poor test of integerness, as it is subject to floating-point rounding effects that may be platform dependent, i.e. some platforms round down subnormal numbers to 0, or may use a different bitwidth for their float type (f32 vs. f64, ...).

Florian Weigand · Answer 3 · Wed Mar 20 2019 20:07:47 GMT+0800 (China Standard Time)

Hello @KellerFuchs, first thanks for your answer.

Please consider my comments:

Compatibility is a big issue and that is why I wrote this issue. Unfortunately if you have a look at the other implementations of ObjectHash (like Java, Go, Python, ...) you see that there is no consistent implementation for integer-valued floats. (see the Python example from the issue description)
I agree with you that testing for an integer with (float)val % 1.0 == 0.0 is a poor test even when it was meant like this (float)val % 1.0 < ɛ. But still the issue to define ɛ is depending on language specific implementations and on the float type (as you said). So I agree with you that this is not a good (because too difficult) solution either.
I agree with the README and your comment that it would be better to introduce a function to generate a common JSON before hashing it. (as done in the Go reference implementation.) It would just be great to have a clear definition of this function (as there is currently none or I am not able to find it). Using the Go json.Marshal(o) looks like a black box to me which is quite hard to implement in other languages.

Currently I use this function:

case Type.Integer:
    {
        if (Settings.COMMON_JSONIFY)
        {
            HashFloat((float)value);
        }
        else
        {
            HashInt((int)value);
        }
        break;
    }

Not sure if this is enough to fully cover the json.Marshal(o) function of Go for the use case of integer-valued floats.