Proposal: hash function optimization

Question

Proposal: hash function optimization

jef-sure opened this issue 3 months ago · comments

Anton Petrusevich commented 3 months ago

I measuresd 0 (zero) collisions with this function.

Measurement parse(std::span<const char>::iterator &iter) {
    Measurement result;

    const char *begin = iter.base();
    uint32_t h = 0;
    while (*iter != ';') {
        h = (h << 6) + (h << 16) - h + *iter; // SDBM_hash
        ++iter;
    }
    result.hash = (uint16_t)((h >> 16) ^ h);
    result.name = {begin, iter.base()};
    ++iter;

    result.value = parse_int_table(iter);

    return result;
}

RNDr. Simon Toth · Answer 1 · Wed May 08 2024 18:35:12 GMT+0800 (China Standard Time)

I didn't put it in the article but the simple hash also had zero collisions on the example data. I suspect that your version would be more stable and suitable for the 10k unique key input as well.

However, the bigger problem I run into is that the loop refused to vectorize, so a SIMD version would be the actual proper approach.

Anton Petrusevich · Answer 2 · Thu May 09 2024 00:23:58 GMT+0800 (China Standard Time)

I suspect that your version would be more stable and suitable for the 10k unique key input as well.

Yes, I measured 0 collisions on 10k unique keys from generated data.