Race during insertion of key/values

Question

Race during insertion of key/values

AKKamath opened this issue 3 years ago · comments

I noticed that there were races in the code.
For example, in line 45 of src/concurrent_map/warp:

uint32_t src_unit_data = (next == SlabHashT::A_INDEX_POINTER)
? *(getPointerFromBucket(src_bucket, laneId))
: *(getPointerFromSlab(next, laneId));

Here a weak load operation is used to read data from the bucket.
Later on (line 65/line 83), an atomicCAS is performed to the same location.

Different threads may read and atomicCAS on the same location, creating a race.

John Owens · Answer 1 · Tue Aug 24 2021 20:57:29 GMT+0800 (China Standard Time)

@sashkiani @maawad

John Owens · Answer 2 · Tue Aug 24 2021 20:58:00 GMT+0800 (China Standard Time)

@AKKamath did you see other races of this type? Thank you for this report!

Aditya K Kamath · Answer 3 · Tue Aug 24 2021 21:03:31 GMT+0800 (China Standard Time)

This seems repeated in many locations.
For example src/concurrent_set/cset_warp_operations.cuh has a similar code snippet at lines 41 - 43.

Muhammad Awad · Answer 4 · Tue Aug 24 2021 21:31:57 GMT+0800 (China Standard Time)

Hi @AKKamath, although the bucket read operation doesn't perform any synchronization, the atomicCAS operation (where we perform the write) will only succeed if the pointer points to an EMPTY_PAIR_64 (i.e., it will fail if we read an empty value, but before the atomicCAS operation a different thread modified that memory location). If different threads try to perform a CAS operation, only one thread will succeed.

Could you explain what scenario would produce a race condition?

I'm also curious about how you found these multiple race conditions.

Aditya K Kamath · Answer 5 · Tue Aug 24 2021 22:11:03 GMT+0800 (China Standard Time)

Since the read operation is "weak" it has no consistency guarantees.
While I don't see any obvious race conditions, there has been prior work that shows data races in CUDA code breaks assumptions.
In the same work, they change a weak store to an atomic operation as it was racing with an atomicCAS.

A simple fix would be changing the weak loads to a "strong" operation by typecasting them to volatile, or using an atomic operation instead, e.g. atomicAdd(getPointerFromBucket(src_bucket, laneId), 0)

Muhammad Awad · Answer 6 · Tue Aug 24 2021 22:42:22 GMT+0800 (China Standard Time)

Thanks for sharing that paper. I read it before.
But as I mentioned, even though the read operation is weak, the operation that matters is the atomic CAS operation.

Aditya K Kamath · Answer 7 · Tue Aug 24 2021 23:42:56 GMT+0800 (China Standard Time)

So by definition (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#conflict-and-data-races), there is a data race between the weak load and atomic.
I guess it could be a benign data race in this case.

John Owens · Answer 8 · Sun Aug 29 2021 02:11:09 GMT+0800 (China Standard Time)

I continue to be interested in getting to the bottom of this. @maawad is there a data race / is it benign, or is this not a race?