owensgroup / SlabHash

A warp-oriented dynamic hash table for GPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Race during insertion of key/values

AKKamath opened this issue · comments

I noticed that there were races in the code.
For example, in line 45 of src/concurrent_map/warp:

uint32_t src_unit_data = (next == SlabHashT::A_INDEX_POINTER)
? *(getPointerFromBucket(src_bucket, laneId))
: *(getPointerFromSlab(next, laneId));

Here a weak load operation is used to read data from the bucket.
Later on (line 65/line 83), an atomicCAS is performed to the same location.

Different threads may read and atomicCAS on the same location, creating a race.

@AKKamath did you see other races of this type? Thank you for this report!

This seems repeated in many locations.
For example src/concurrent_set/cset_warp_operations.cuh has a similar code snippet at lines 41 - 43.

Hi @AKKamath, although the bucket read operation doesn't perform any synchronization, the atomicCAS operation (where we perform the write) will only succeed if the pointer points to an EMPTY_PAIR_64 (i.e., it will fail if we read an empty value, but before the atomicCAS operation a different thread modified that memory location). If different threads try to perform a CAS operation, only one thread will succeed.

Could you explain what scenario would produce a race condition?

I'm also curious about how you found these multiple race conditions.

Since the read operation is "weak" it has no consistency guarantees.
While I don't see any obvious race conditions, there has been prior work that shows data races in CUDA code breaks assumptions.
In the same work, they change a weak store to an atomic operation as it was racing with an atomicCAS.

A simple fix would be changing the weak loads to a "strong" operation by typecasting them to volatile, or using an atomic operation instead, e.g. atomicAdd(getPointerFromBucket(src_bucket, laneId), 0)

Thanks for sharing that paper. I read it before.
But as I mentioned, even though the read operation is weak, the operation that matters is the atomic CAS operation.

So by definition (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#conflict-and-data-races), there is a data race between the weak load and atomic.
I guess it could be a benign data race in this case.

I continue to be interested in getting to the bottom of this. @maawad is there a data race / is it benign, or is this not a race?