rapidsai / node

GPU-accelerated data science and visualization in node

Home Page:https://rapidsai.github.io/node/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FIX] Handle sending of empty dataframes differently when using SQL file table creation

matekdev opened this issue · comments

https://github.com/rapidsai/node/blob/main/modules/sql/src/cluster.ts#L205-L231

When doing a multi-worker query on files, we distribute the files among the workers. There is a chance that a worker does not receive a file (ex. if there aren't enough .csv files), which requires us to send over an empty data frame. The current logic for sending over an empty dataframe needs work, we should avoid generating a message and using send(...).

Possible solutions

  1. Send the file paths to the workers that we can, then call broadcast() once if we have leftover works that need empty DFs.
  2. Message_id can be created using a random number