NVIDIA / cuCollections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ENHANCEMENT]: Place the existing key to the right-hand side during equality checks

PointKernel opened this issue · comments

Is your feature request related to a problem? Please describe.

cuco hash tables always place the slot key on the left-hand side for key equality checks:

* @note cuCollections data structures always place the slot keys on the left-hand side when
* invoking the key comparison predicate, i.e., `pred(slot_key, query_key)`. Order-sensitive
* `KeyEqual` should be used with caution.

This was a completely random choice when I started the open-addressing refactoring and I thought it didn't matter and was wrong.

Generally speaking, when we want to check if two variables are identical, we put the query value on the left-hand side and the "reference" or the existing value on the right-hand side. e.g. we do

if (idx == 0) { ... }

instead of

if (0 == idx) { ... }

The new cuco data structures are unfortunately following the latter pattern.

This works fine until we meet the hash join use case where the build table is the right table and the probe table is the left table. As a result, the left table is always on the right when doing comparisons in cuco while the right table is always on the left. In many places across libcudf, build_table/right_table and probe_table/left_table are interchangeable terms thus for a function join_func expecting the first argument to be the build table and the second argument to be the probe table, we may have to invoke it awkwardly:

void join_func(right_table, left_table, ...)

This must be stopped.

Describe the solution you'd like

Always place the existing value (either the sentinel value or the slot key) on the right side for equality checks.

e.g.

this->predicate_.operator()<is_insert::YES>(this->extract_key(slot_content), key);

We should do the following instead

this->predicate_.operator()<is_insert::YES>(key, this->extract_key(slot_content));