tactcomplabs / circustent

Memory system characterization benchmarks using atomic operations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenACC will only run on CPU with GCC

jeffhammond opened this issue · comments

OpenACC uses GNU __atomic not #pragma acc atomic and the CMake assumes GCC's implementation.

I will try to fix this.

@jeffhammond, thanks for the find!

Agreed. Thanks for the find! I have been giving this some thought as well.

To my thinking, it would be ideal (from a portability standpoint) to decouple CircusTent from the GNU specific atomics altogether. Currently, the pthreads, OpenMP, and OpenACC backends all utilize them.

For pthreads we could use the C11 standard, but I am not entirely sure if that equates to a step forward or backward with respect to portability.

For OpenMP & OpenACC, we should be able to replace the add atomic operations with #pragma directive versions, but I'm not sure the compare-and-swap are doable in the same manner. If not, we could just omit the CAS variations from these backends. Alternatively, it looks like the "capture" clause in these models might support an unconditional atomic swap.

@jeffhammond @jleidel What do you guys think?

I think you want to keep Pthreads + GCC intrinsics as a generic implementation on CPUs.

What I'd add is a true C++11 (std::thread plus std::atomic) or C++17/20 (std::for_each(std::execution::par_unseq... plus std::atomic_ref) implementation. The latter will work on CPU and GPU (once std::atomic_ref is available). If you do the C++11 one, you can do the C11 one too, with minimal additional effort.

Granted, I find std::thread annoying relative to Pthreads, and std::atomic or _Atomic means you need an array of those, unlike intrinsics or std::atomic_ref, so maybe you just want to do the C++17/20 port.

I haven't read enough on OpenACC atomics to be sure about CAS. They used to not have it, but I think they do now.

In any case, you can decouple the implementation of atomics from the implementation of threads pretty easily. For example, look at https://github.com/jeffhammond/Quicksilver/blob/cxx20-atomics/src/AtomicMacro.hh.

@BrodyWilliams C11 threads used to be way less portable than Pthreads, but glibc finally got with the program in 2018 (https://sourceware.org/legacy-ml/libc-alpha/2018-08/msg00003.html).

Issues should be addressed by #17 . (C++11 implementation added, OpenACC & OpenMP+Target revised)