Add Hashable typeclass
fehrenbach opened this issue · comments
I would like to add a Hashable
typeclass to the prelude and deriving support for it to the compiler. It is intended as an approximation of equality for use in hash-based algorithms and data structures, such as hash maps, not cryptographically secure digests or password hashes.
class Eq a <= Hashable a where
hash :: a -> Int
There are two reasons for adding it to the prelude instead of having a standalone library:
- Dependencies: To be useful in general data structures, as many datatypes as possible should be hashable. Ideally, whenever a datatype is
Eq
, it should also haveOrd
andHashable
instances, if possible. Most packages depend on the prelude already.[citation needed] This should make it easier to convince authors to addHashable
instances, because they do not need to add another dependency, all that is required is one more deriving line below the lines forEq
andOrd
. - Performance: Usually, the main goal of hashing is to avoid expensive comparisons. One could write a generic hash function in terms of
purescript-generics-rep
. This would be fine for many applications, but for hash-based data structures etc., the hash function should really be as fast as possible.
I am more than happy to make an initial pull request (based on https://github.com/fehrenbach/purescript-unordered-collections/blob/master/src/Data/Hashable.purs) and work on deriving support.
What do people think?
I'd definitely like to have this 👍
Great.
I think the only major design decision is whether to newtype Hash = Hash Int
or not.
I'd like the newtype
personally, as the name at least gives a bit of a semantic clue to the value's intention.
Since this is primarily done for performance reasons I'd like to make sure the newtype doesn't incur any annoying penalties, obviously newtype means the underlying representation is an Int, but if we find ourselves mapping the newtype constructor a lot we'll end up with identity
calls that the compiler currently doesn't optimize.
Also, equality comparisons on Int are special-cased by the compiler to emit ===. This is not currently done for (newtype) derived instances for newtypes over Int.
newtype Hash = Hash Int
derive newtype {- or not -} instance eqHash :: Eq Hash
approxEq :: Hash -> Hash -> Boolean
approxEq a b = a == b -- ... return Data_Eq.eq(eqHash)(a)(b);
-- can be avoided by being careful
approxEq' :: Hash -> Hash -> Boolean
approxEq' (Hash a) (Hash b) = a == b -- ... return a === b;
I guess just not providing the Eq instance would force the second version, but would be a bit annoying.
On the other hand, one more dictionary lookup and two applications is probably not the world, and writing performance critical code is annoying anyway.
I don't have a strong opinion either way.
We could even index the Hash newtype by the type it's a hash of:
newtype Hash a = Hash Int
class Eq a <= Hashable a where
hash :: a -> Hash a
I think this almost makes sense because there is rarely a reason to do anything with hash values from different source types. On the other hand, I think I'd rather avoid providing an Eq Hash
instance, at least at the moment, so it hardly makes a difference.
☝️ I like that (type-indexing the Hash
).
I don't really know enough to give an opinion on whether the lack of inlining for the instances is a noticeable expense or not though - you're probably best placed to determine that @fehrenbach, since you've done all the work with hashed values in -unordered-collections
😄
While we're here, do we think it would be a good idea to derive Hashable
in the compiler like we do for Eq
and Ord
? That way it's even less likely that people have disagreeing Eq and Hashable instances.
@garyb I'll go for the type-indexed newtype and the usual classes. It's probably not a big performance problem and if it turns out to be, unwrapping is an easy workaround or we could just inline Eq for Hash too.
@kritzcreek Yes, I think so. This is a big part of the motivation for having Hashable in the prelude. The original issue text was meant to say that, but there's only a hint left. I must have accidentally removed it while editing... I already have a branch that derives Hashable, but without the newtype. It will take some work to change it to wrap and unwrap as necessary.
@hdgarrood points out that adding something like hashWithSalt
to the class might be better (and providing hash
as a convenience function).
fehrenbach/purescript-unordered-collections#8 (comment)