benlaurie / objecthash

A way to cryptographically hash objects (in the JSON-ish sense) that works cross-language. And, therefore, cross-encoding.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Option for array order independent hashing

about-code opened this issue · comments

I am currently looking for object hashing strategies to quickly tell by comparison of the hashsum of two objects o1 and o2 whether they have identical contents. One important requirement is that the order of items of an array shouldn't matter since a particular order can't be guaranteed or isn't important. So when hashing o1and o2 where o1 and o2 are defined as

var o1 = ["foo", {"bar": ["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]
var o2 = ["foo", {"bar": [null, "baz", 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]

then the hash for o1 and o2 should be equal even though the item oder of bar differs. The hashsum should remain to be distinct from o3 with

var o3 = [{"bar": ["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]

which misses the item "foo".

Use cases I have in mind are change detection and caching strategies. E.g. clients with a REST-API could decide upon comparing a hashsum against an E-Tag HTTP-Header whether to update a client-side cache.

Array order independence could be achieved by sorting item hashes in the array prior to calculating the hashsum over the items (before updating the digester, respectively) - similar to how it is already implemented for hashObject(). By sorting the hashes of array items rather than sorting raw values we can sort lexicographically and gain reproducable hash sums independent of whether the raw item values are sortable.

I've implemented a demo and test cases for objecthash-js by @emschwartz (diff). It adds a new option to tweak the algorithm.