Addition of driver-level parallel looping pragma in SCCAnnotate is fragile
skarppinen opened this issue · comments
Hello, filing an issue by request of @reuterbal.
The issue is that in annotate_driver
of SCCAnnotate
, the following code is used to place the driver-level looping pragma (i.e "acc parallel loop gang vector_length(...)"):
loki/transformations/transformations/single_column_coalesced.py
Lines 657 to 671 in 5c0f821
The condition of the IF statement can easily be missed. For example, I ran into this issue when I used the driver loop as a guide for adding two pragmas ($loki data
and $loki update_host
, for purposes of DataOffloadTransformation
and GlobalVarOffloadTransformation
) just above the driver loop. Consequently, these pragmas ended up in driver_loop.pragma
, with len(driver_loop.pragma) != 1
, and thus no parallel loop pragma appeared as a result of the SCC transformation.