michaeljclark / ethical_hashmap

Fast open addressing hash table

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ethical hash map

High performance, compact and ethical C++ hash sets and maps:

hash_set

  • Fast open addressing hash set tuned for small sets.

hash_map

  • Fast open addressing hash map tuned for small maps.

linked_hash_set

  • Fast open addressing hash set with link list tuned for small maps with predictable iteration order.

linked_hash_map

  • Fast open addressing hash map with link list tuned for small maps with predictable iteration order.

These hash sets and maps are open-addressing hashtables similar to google/dense_hash_map, but they use tombstone bitmaps to eliminate the necessity for empty or deleted key sentinels.

These containers implement most of the C++ unordered associative container requrements thus can be substituted for unordered_map and typical use cases including C++11 for-loop.

There is currently no equivalent to linked_hash_map in the STL so it is perhaps somewhat like an ordered_map that uses the pos iterator hint in the first argument of insert to set the position of a new value inserted into the linked list, while find uses the hashed key index to locate key value pairs in the hash table.

Design

These maps are designed for a small memory footprint. There is one structure and one heap allocation which is divided between an array of tuples composed of (Key, Value) pairs and a tombstone bitmap. Key is hashed to index into the open addressing array, and collisions are handled by skipping to the next available slot.

Deletions requires the use of tombstones. Sentinel key values occupy key-space, and make containers incompatible with associative container requirements, thus a 2-bit tombstone array is used. 2-bits are used to distinguish available, occupied, and deleted states.

    enum bitmap_state {
        available = 0, occupied = 1, deleted = 2, recycled = 3
    };

linked_hash_map adds (next, prev) indices to the array tuple, (head, tail) indices to the structure.

Memory usage

The follow table shows memory usage for the default 16 slot map (table units are in bytes):

map struct malloc
hash_set 40 68
hash_set 40 132
hash_map<u32,u32> 40 132
hash_map<u64,u64> 40 260
linked_hash_set<u32,i32> 48 198
linked_hash_set<u64,i64> 56 388
linked_hash_map<u32,u32,i32> 48 260
linked_hash_map<u64,u64,i64> 56 516

The following table shows the structure size in bytes on x86_64:

map size
sizeof(hash_map) 40
sizeof(linked_hash_map) 48
sizeof(absl::flat_hash_map) 48
sizeof(std::unordered_map) 56
sizeof(tsl::robin_map) 80
sizeof(google::dense_hash_map) 88

linked_hash_map by default uses 32-bit integers for indices, limiting it to 2^31 entries, however, it can be instantiated with alternative integer types for indices.

Build Instructions

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build

About

Fast open addressing hash table


Languages

Language:C++ 91.3%Language:C 8.0%Language:CMake 0.7%