multiformats / multihash

Self describing hashes - for future proofing

Home Page:https://multiformats.io/multihash/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

is it too late for a suffix version?

dominictarr opened this issue · comments

some things sort by prefixes (i.e. leveldb), and you would maybe rather keep the hashes uniformly distributed (at least you can reason about it).

For example, this would play much nicer with leveldb if it was a suffix not a prefix.

Yeah, non-uniform distribution on mixed-hash-function hashes will affect various applications. You can reverse the hash (as you pointed out elsewhere, and we've discussed on #ipfs), and that gives the your multihashes a similar distribution to the hash fn (most of which will be uniform).

I considered the suffix idea for a while, but decided against it (so far) based on these arguments:

  • having two formats for multihash gets confusing. users can opt to reverse the hashes, and that's fine, but users getting an opaque "multihash" then have to wonder what order it's in.
  • suffixes nullify one of the benefits of keeping around the length: skipping over the whole hash by reading the second int. (which perhaps should be the first? pascal string style)
  • prefixes don't actually negatively impact things like leveldb. mixing hash functions might yield very different locations for things, and cluster things into sets of key prefixes, but leveldb continues to function equally well. sorting and so on don't matter anyway because these are hashes, not keys that are meaningful string / sorted. (in leveldb specifically, this is the same as having a prefix on the key, which many apps do to namespace values)

just curious, whats the use-case for storing the length of the hash?

truncated hashes. some people, for example, use a sha2-512 truncated to 256 bits instead of sha2-256 because in some archs it's faster.