NVIDIA / cuCollections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE]: Add multiset host-bulk count APIs

PointKernel opened this issue · comments

Is your feature request related to a problem? Please describe.

Add multiset host-bulk count APIs

Describe the solution you'd like

The basic API must have:

  /**
   * @brief Counts the occurrences of keys in `[first, last)` contained in the multiset.
   *
   * @tparam Input Device accessible input iterator
   * 
   * @param first Beginning of the sequence of keys to count
   * @param last End of the sequence of keys to count
   * @param stream CUDA stream used for count
   * 
   * @return The sum of total occurrences of all keys in `[first, last)`
   */
  template <typename InputIt>
  size_type count(InputIt first,
                  InputIt last,
                  cuda_stream_ref stream = {}) const;

Note: There is no restriction for the iterator value_type due to heterogeneous lookup support.

More specifically, libcudf requires the below count variants

  /**
   * @brief Counts the occurrences of keys in `[first, last)` contained in the multiset.
   *
   * @tparam Input Device accessible input iterator
   * @tparam ProbeKeyEqual Binary callable
   * @tparam ProbeHash Unary hash callable
   * 
   * @param first Beginning of the sequence of keys to count
   * @param last End of the sequence of keys to count
   * @param probe_key_equal Binary function to compare two keys for equality
   * @param probe_hash Unary callable to hash a given key
   * @param stream CUDA stream used for count
   * 
   * @return The sum of total occurrences of all keys in `[first, last)`
   */
  template <typename InputIt, typename ProbeKeyEqual, typename ProbeHash>
  size_type count(InputIt first,
                  InputIt last,
                  ProbeKeyEqual probe_key_equal,
                  ProbeHash probe_hash, 
                  cuda_stream_ref stream = {}) const;

  /**
   * @brief Counts the occurrences of keys in `[first, last)` contained in the multiset.
   * 
   * @note: If a given key has no matches, its occurrence is 1.
   * 
   * @tparam Input Device accessible input iterator
   * @tparam ProbeKeyEqual Binary callable
   * @tparam ProbeHash Unary hash callable
   * 
   * @param first Beginning of the sequence of keys to count
   * @param last End of the sequence of keys to count
   * @param probe_key_equal Binary function to compare two keys for equality
   * @param probe_hash Unary callable to hash a given key
   * @param stream CUDA stream used for count
   * 
   * @return The sum of total occurrences of all keys in `[first, last)` where keys have no matches
   * are considered to have a single occurrence.
   */
  template <typename InputIt, typename ProbeKeyEqual, typename ProbeHash>
  size_type count_outer(InputIt first,
                        InputIt last,
                        ProbeKeyEqual probe_key_equal,
                        ProbeHash probe_hash, 
                        cuda_stream_ref stream = {}) const;

Describe alternatives you've considered

Can the _outer variant be represented in a more natural expression?

Could we use cub::DeviceReduce?

Additional context

  • All count APIs are synchronous thus count_ async/count_outer_async don't make sense
  • Conditional count APIs like count_if and count_outer_if that takes probe keyequal/hash are also desired. Good to have but not necessary for the current scope.
  • Hashers must be default constructible and can be constructed from an integer value (required by double hashing)take

Closed by #490