NVIDIA / cuCollections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE]: Add `open_addressing_ref_impl` ctor overload taking custom key equal and probing scheme

PointKernel opened this issue · comments

Is your feature request related to a problem? Please describe.

A common hash-based algorithm in libcudf uses a one-table comparator/hasher to insert the hash table and another two-table comparator/hasher to perform query operations like contains, count, etc. Cuco hash tables take a key comparator type as their class template parameter but very often the query data type as well as the two-table comparator type are not known when constructing the hash table. The current workaround is to use a helper struct wrapping both one-table and two-table comparators as the class KeyEqual template parameter but this doesn't work for use-cases like hash join where it's desired to build the table once and query it multiple times with different data.

Describe the solution you'd like

For hash table query operations, cuco should provide an overload that takes custom probing scheme and key equality. This requires an overload of open_addressing_ref_impl ctor that takes QueryKeyEqual and QueryProbingScheme like:

  /**
   * @brief Constructs open_addressing_ref_impl.
   *
   * @note: It's the users' responsibility to make sure the custom probing
   * scheme and equality can properly probe the hash table. 
   *
   * @tparam QueryKeyEqual Query key equal type
   * @tparam QueryProbingScheme Query probing scheme type
   *
   * @param empty_slot_sentinel Sentinel indicating an empty slot
   * @param predicate Query key equality binary callable
   * @param probing_scheme Query probing scheme
   * @param storage_ref Non-owning ref of slot storage
   */
  template<class QueryKeyEqual, class QueryProbingScheme>
  __host__ __device__ explicit constexpr open_addressing_ref_impl(
    value_type empty_slot_sentinel,
    QueryKeyEqual const& predicate,
    QueryProbingScheme const& probing_scheme,
    storage_ref_type storage_ref) noexcept;

Describe alternatives you've considered

Normally users would want to customize hasher and key equality only. Thus we may need the below interface instead:

  template<class ProbeKeyEqual, class ProbeHash>
  __host__ __device__ explicit constexpr open_addressing_ref_impl(
    value_type empty_slot_sentinel,
    ProbeKeyEqual const& probing_predicate,
    ProbeHash const& probing_hash,
    storage_ref_type storage_ref) noexcept;

Additional context

Ideally, this feature should be supported ASAP thus we can apply it in static_multiset operations like retrieve and count.

Instead of introducing those ref ctors, what we really need is a helper function on the hash table ref side that can make a copy of the current ref with new hasher or key comparator. #467 adds those helpers.