beacon-biosignals / StableHashTraits.jl

Compute hashes over any Julia object simply and reproducibly

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

stable_hash(::Regex)

rasmushenningsson opened this issue · comments

Regex is a mutable struct which causes it to stable_hash differently each time.

julia> bytes2hex(stable_hash(r"hello"; version=3))
"d9bdaff2d67b5f0d6890fafddde31b70678837711d5352fbc848a436b47d8e22"

julia> bytes2hex(stable_hash(r"hello"; version=3))
"665ac86a45a852990551faa2462d395029a029336935e12be6bfaf1ecabcc76d"

I think it would make sense to make it hash in a stable manner by default. Do you agree?
(The reason why it is a mutable struct is so it can interact with the GC to free the pointer to the compiled regex.)

Here's the corresponding hash function in Base, indicating that we should take pattern, compile_options and match_options into account.

function hash(r::Regex, h::UInt)
    h += hashre_seed
    h = hash(r.pattern, h)
    h = hash(r.compile_options, h)
    h = hash(r.match_options, h)
end

I think it would make sense to make it hash in a stable manner by default. Do you agree?

Yes, that sounds useful to me. I am wrapping up some work in #55 to improve the stability of hashes with a new API design for customizing how types get hashed. I think this improvement makes sense to make after that merges.

In the meantime, I believe the following should be an effective workaround. (Though I haven't tested it).

StableHashTraits.hash_method(::Regex) = FnHash(x -> (x.pattern, x.compile_options, r.match_options))

Or, for a specific context, you could do

struct HashRegex{T}
   parent::T
end
StableHashTraits.parent_context(x::HashRegex) = x.parent
StableHashTraits.hash_method(::Regex, ::HashRegex) = FnHash(x -> (x.pattern, x.compile_options, r.match_options))
stable_hash(r"a*b*", HashRegex(HashVersion{3}()))