Proposal: hash function optimization
jef-sure opened this issue · comments
I measuresd 0 (zero) collisions with this function.
Measurement parse(std::span<const char>::iterator &iter) {
Measurement result;
const char *begin = iter.base();
uint32_t h = 0;
while (*iter != ';') {
h = (h << 6) + (h << 16) - h + *iter; // SDBM_hash
++iter;
}
result.hash = (uint16_t)((h >> 16) ^ h);
result.name = {begin, iter.base()};
++iter;
result.value = parse_int_table(iter);
return result;
}
I didn't put it in the article but the simple hash also had zero collisions on the example data. I suspect that your version would be more stable and suitable for the 10k unique key input as well.
However, the bigger problem I run into is that the loop refused to vectorize, so a SIMD version would be the actual proper approach.
I suspect that your version would be more stable and suitable for the 10k unique key input as well.
Yes, I measured 0 collisions on 10k unique keys from generated data.