HappyCerberus / 1brc

The One Billion Row Challenge using C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal: hash function optimization

jef-sure opened this issue · comments

I measuresd 0 (zero) collisions with this function.

Measurement parse(std::span<const char>::iterator &iter) {
    Measurement result;

    const char *begin = iter.base();
    uint32_t h = 0;
    while (*iter != ';') {
        h = (h << 6) + (h << 16) - h + *iter; // SDBM_hash
        ++iter;
    }
    result.hash = (uint16_t)((h >> 16) ^ h);
    result.name = {begin, iter.base()};
    ++iter;

    result.value = parse_int_table(iter);

    return result;
}

I didn't put it in the article but the simple hash also had zero collisions on the example data. I suspect that your version would be more stable and suitable for the 10k unique key input as well.

However, the bigger problem I run into is that the loop refused to vectorize, so a SIMD version would be the actual proper approach.

I suspect that your version would be more stable and suitable for the 10k unique key input as well.

Yes, I measured 0 collisions on 10k unique keys from generated data.