tavi-cacina / unordered_dense

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

meson_build_test GitHub license CII Best Practices

🚀 ankerl::unordered_dense::{map, set}

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion.

The classes ankerl::unordered_dense::map and ankerl::unordered_dense::set are (almost) drop-in replacements of std::unordered_map and std::unordered_set. While they don't have as strong iterator / reference stability guaranties, they are typically much faster. Here is a short summary of the properties:

Advantages

  • Perfect iteration speed - Data is stored in a std::vector, all data is contiguous!
  • Very fast insertion & lookup speed, in the same ballpark as absl::flat_hash_map
  • Low memory usage
  • Full support for std::allocators, and polymorphic allocators. There are ankerl::unordered_dense::pmr typedefs available
  • Simple: single header with just a bit over 1000 lines of code, this is less than half of robin-hood-hashing

Disadvantages

  • Deletion speed is relatively slow. This needs two lookups: one for the element to delete, and one for the element that is moved onto the newly empty spot.
  • no const Key in std::pair<Key, Value>
  • Iterators are not stable on insert/erase

Design

The map/set has two data structures:

  • std::vector<value_type> which holds all data. map/set iterators are just std::vector<value_type>::iterator!
  • An indexing structure (bucket array), which is a flat array with 8-byte buckets.

Inserts

Whenever an element is added it is emplace_back to the vector. The key is hashed, and an entry (bucket) is added at the corresponding location in the bucket array. The bucket has this structure:

struct Bucket {
    uint32_t dist_and_fingerprint;
    uint32_t value_idx;
};

Each bucket stores 3 things:

  • The distance of that value from the original hashed location (3 most significant bytes in dist_and_fingerprint)
  • A fingerprint; 1 byte of the hash (lowest significant byte in dist_and_fingerprint)
  • An index where in the vector the actual data is stored.

This structure is especially designed for the collision resolution strategy robin-hood hashing with backward shift deletion.

Lookups

The key is hashed and the bucket array is searched if it has an entry at that location with that fingerprint. When found, the key in the data vector is compared, and when equal the value is returned.

Removals

Since all data is stored in a vector, removals are a bit more complicated:

  1. First, lookup the element to delete in the index array.
  2. When found, replace that element in the vector with the last element in the vector.
  3. Update two locations in the bucket array: First remove the bucket for the removed element
  4. Then, update the value_idx of the moved element. This requires another lookup.

About

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion

License:MIT License


Languages

Language:C++ 97.0%Language:Python 1.9%Language:Meson 0.9%Language:CMake 0.3%