rapidsai / node

GPU-accelerated data science and visualization in node

Home Page:https://rapidsai.github.io/node/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEA] Use CUDA Arrow IPC primitives to read and write DFs in GPU memory

trxcllnt opened this issue · comments

The Arrow IPC primitives support reading and writing Tables and Columns in GPU memory. We should add support for reading the Arrow IPC format when the input data is a CUDA buffer, as well as writing DFs to CUDA buffers of the Arrow IPC format.

This would allow us to easily serialize a DataFrame to GPU memory, share that memory with multiple processes (via CUDA IPC), and allow those processes to zero-copy read the Arrow Table from the shared memory pointer and use its buffers as the backing storage for a DataFrame.

cuDF Python has support for zero-copy reading the Arrow IPC format stored in a CUDA buffer 1 2 with a bit of help from libcudf 3. It doesn't support writing the Arrow IPC format to a CUDA buffer, but we should be able to use the reading logic as a guide.

  1. GpuArrowReader in python/cudf/cudf/comm/gpuarrow.py
  2. CudaRecordBatchStreamReader in python/cudf/cudf/_lib/gpuarrow.pyx
  3. CudaMessageReader in cpp/src/comms/ipc/ipc.cpp

Done in #250