[FEATURE]: Add `open_addressing_ref_impl` ctor overload taking custom key equal and probing scheme
PointKernel opened this issue · comments
Is your feature request related to a problem? Please describe.
A common hash-based algorithm in libcudf uses a one-table comparator/hasher to insert the hash table and another two-table comparator/hasher to perform query operations like contains
, count
, etc. Cuco hash tables take a key comparator type as their class template parameter but very often the query data type as well as the two-table comparator type are not known when constructing the hash table. The current workaround is to use a helper struct wrapping both one-table and two-table comparators as the class KeyEqual
template parameter but this doesn't work for use-cases like hash join where it's desired to build the table once and query it multiple times with different data.
Describe the solution you'd like
For hash table query operations, cuco should provide an overload that takes custom probing scheme and key equality. This requires an overload of open_addressing_ref_impl
ctor that takes QueryKeyEqual
and QueryProbingScheme
like:
/**
* @brief Constructs open_addressing_ref_impl.
*
* @note: It's the users' responsibility to make sure the custom probing
* scheme and equality can properly probe the hash table.
*
* @tparam QueryKeyEqual Query key equal type
* @tparam QueryProbingScheme Query probing scheme type
*
* @param empty_slot_sentinel Sentinel indicating an empty slot
* @param predicate Query key equality binary callable
* @param probing_scheme Query probing scheme
* @param storage_ref Non-owning ref of slot storage
*/
template<class QueryKeyEqual, class QueryProbingScheme>
__host__ __device__ explicit constexpr open_addressing_ref_impl(
value_type empty_slot_sentinel,
QueryKeyEqual const& predicate,
QueryProbingScheme const& probing_scheme,
storage_ref_type storage_ref) noexcept;
Describe alternatives you've considered
Normally users would want to customize hasher and key equality only. Thus we may need the below interface instead:
template<class ProbeKeyEqual, class ProbeHash>
__host__ __device__ explicit constexpr open_addressing_ref_impl(
value_type empty_slot_sentinel,
ProbeKeyEqual const& probing_predicate,
ProbeHash const& probing_hash,
storage_ref_type storage_ref) noexcept;
Additional context
Ideally, this feature should be supported ASAP thus we can apply it in static_multiset
operations like retrieve
and count
.
Instead of introducing those ref ctors, what we really need is a helper function on the hash table ref side that can make a copy of the current ref with new hasher or key comparator. #467 adds those helpers.