rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Home Page:https://docs.rapids.ai/api/raft/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ANN subsample dataset: use mdspan input

tfeher opened this issue · comments

Currently neighbors::detail::utils::subsample takes the dataset input as plain pointer.

The input shall be replaced with an mdspan. This is not done in #2077, because the following question needs to be clarified:

What is the right way to map a pointer for raft mdspan API, if I do not know (and do not care) whether the pointer is on host or device?

One way to do that is to query the pointer attribute and map it accordingly

  cudaPointerAttributes attr;
  RAFT_CUDA_TRY(cudaPointerGetAttributes(&attr, input));
  T* ptr = reinterpret_cast<T*>(attr.devicePointer);
  if (ptr != nullptr) {
    auto dataset = raft::make_device_matrix_view<const T, IdxT>(ptr, n_samples, n_dim);
    my_function(res, dataset);
  } else {
     auto dataset = raft::make_host_matrix_view<const T, IdxT>(input, n_samples, n_dim);
    my_function(res, dataset);
}

But if my_function does only pass the arrays to a third function, then I would need a plain mdspan without any host or device annotation. Shall we work with plain std::experimental::mdspan, or do we want to allow host_device_accessor that has no information about where the data is accessible?