Examples/Tests |
---|
BGHT is a collection of high-performance static GPU hash tables. BGHT contains hash tables that use three different probing schemes 1) bucketed cuckoo, 2) power-of-two, 3) iceberg hashing. Our bucketed static cuckoo hash table is the state-of-art static hash table. For more information, please check our paper:
Better GPU Hash Tables
Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, MartĂn Farach-Colton, and John D. Owens
- State-of-the-art static GPU hash tables
- Device and host side APIs
- Support for different types of keys and values
- Standard-like APIs
BGHT is a header-only library. To use the library, you can add it as a submodule or use CMake Package Manager (CPM) to fetch the library into your CMake-based project (complete example).
cmake_minimum_required(VERSION 3.8 FATAL_ERROR)
CPMAddPackage(
NAME bght
GITHUB_REPOSITORY owensgroup/BGHT
GIT_TAG main
OPTIONS
"build_tests OFF"
"build_benchmarks OFF"
)
target_link_libraries(my_library PRIVATE bght)
All the data structures follow the C++ standard hash map (std::unordered_map
) APIs closely. An example APIs for BCHT is shown below:
template <class Key,
class T,
class Hash = bght::universal_hash<Key>,
class KeyEqual = bght::equal_to<Key>,
cuda::thread_scope Scope = cuda::thread_scope_device,
class Allocator = bght::cuda_allocator<char>,
int B = 16> class bcht;
// Constructor
bcht(std::size_t capacity,
Key sentinel_key,
T sentinel_value,
Allocator const& allocator = Allocator{});
// Host-side APIs
template <typename InputIt>
bool insert(InputIt first, InputIt last, cudaStream_t stream = 0);
template <typename InputIt, typename OutputIt>
void find(InputIt first, InputIt last, OutputIt output_begin, cudaStream_t stream = 0);
// Device-side APIs
template <typename tile_type>
__device__ bool insert(value_type const& pair, tile_type const& tile);
template <typename tile_type>
__device__ mapped_type find(key_type const& key, tile_type const& tile);
Member type Definition
key_type Key
mapped_type T
value_type bght::pair<Key, T>
allocator_type Allocator
bucket_size Bucket size for device-side APIs cooperative groups tile construction
// Example using host-side APIs
#include <bcht.hpp>
int main(){
using key_type = uint32_t;
using value_type = uint32_t;
using pair_type = bght::pair<key_type, value_type>;
std::size_t capacity = 128; std::size_t num_keys = 64;
key_type invalid_key = 0; value_type invalid_value = 0; // sentinel key and value
bght::bcht<key_type, value_type> table(capacity, invalid_key, invalid_value); //ctor
pair_type* pairs; // input pairs
// ... allocate pairs
bool success = table.insert(pairs, pairs + num_keys);
assert(success);
key_type* queries; // query keys
value_type* results; // query result
// ... allocate queries and results
table.find(queries, queries + num_keys, results);
}
// Example using device-side APIs
template<class HashMap>
__global__ void kernel(HashMap table){
// construct tile
auto block = cooperative_groups::this_thread_block();
auto tile = cooperative_groups::tiled_partition<HashMap::bucket_size>(block);
pair_type pair{...};
table.insert(pair, tile);
pair_type query{..};
query.second = talbe.find(query.first, tile);
}
int main(){
// Call the hash table constructor on the CPU
bght::bcht<key_type, value_type> table(...);
// Pass the hash table to a GPU kernel
kernel<<<...>>>(table);
}
Please create an issue if you face challenges with any of the following limitations and requirements.
- C++17/CUDA C++17
- NVIDIA Volta GPU or later microarchitectures
- CMake 3.8 or later
- CUDA 11.5 or later
- Currently hash tables based on cuckoo hashing do not support concurrent insertion and queries. IHT and P2BHT support concurrent insertions and queries.
- Keys must be unique
- Construction of the data structures offered may fail. In these scenarios, reconstructing the table using a larger capacity or a lower load factor should be considered. Our paper offers recommended hash table load factors (for uniformly distributed unsigned keys) to achieve at least a 99% success rate (See Fig. 2). For example, BCHT will offer a 100% success rate for up to 0.991 load factor. Please create an issue if you encounter any problems with different key distributions.
To reproduce the results, follow the following steps. You can also view our results here. If you find any mismatch (either faster or slower) between the results offered in the repository or the paper, please create an issue, and we will investigate the performance changes.
Please check our paper for comprehensive analysis and benchmarks. Also, see the following steps to reproduce the results.
An additional comparison of our BCHT to cucCollection
's cuco::static_map
is shown below. The comparison is between our BCHT with B = 16 (default configuration) and cuco::static_map
. Input keys (50 million pairs) are uniformly distributed unsigned keys. The benchmarking was performed on an NVIDIA Titan V GPU (higher is better):
Please create an issue. We will welcome any contributions that improve the usability and quality of our repository.
@article{Awad:2021:BGH,
title = {Better {GPU} Hash Tables},
author = {Muhammad A. Awad and Saman Ashkiani and Serban D.
Porumbescu and Mart{\'{i}}n Farach-Colton and John
D. Owens},
year = 2021,
month = aug,
primaryclass = {cs.DS},
journal = {CoRR},
volume = {abs/2108.07232},
archiveprefix = {arXiv},
number = {2108.07232},
eprint = {2108.07232},
nonrefereed = {true},
code = {https://github.com/owensgroup/BGHT}
}
The structure and organization of the repository were inspired by NVIDIA's cuCollection and RXMesh.