rise-lang / shine

The Shine compiler for the RISE language

Home Page:https://rise-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ensuring that barriers are encountered by all work-items

Bastacyclop opened this issue · comments

Issue related to #18 and remaining after #80.

In some cases, the generated barriers might not be encountered by all work-items. Example low-level Rise program and its generated OpenCL code:

 λin : n.m.o.p.f32. (
   in ▷ mapWorkGroup(1) (mapWorkGroup(0) (
     mapLocal(1) (
     mapLocal(0) (λx. x) ▷
     toMem(local) ▷
     slide 3 1 ▷
     mapLocal(0) sum
 ))))
for (int wg0 = get_group_id(1); wg0 < n; wg0 += get_num_groups(1)) {
  for (int wg1 = get_group_id(0); wg1 < m; wg1 += get_num_groups(0)) {
-   for (int l0 = get_local_id(1); l0 < o; l0 += get_local_size(1)) {
+   for (int l0 = get_local_id(1); l0 < ctt(o); l0 += get_local_size(1)) {
+     if (l0 < o) {
      for (int l1 = get_local_id(0); l1 < p; l1 += get_local_size(0)) {
        [...] // read from global input; write to local memory
      }
+     }
      barrier(CLK_LOCAL_MEM_FENCE);
+     if (l0 < o) {
      for (int l2 = get_local_id(0); l2 < p-2; l2 += get_local_size(0)) {
        [...] // read from local memory; write to global output
      }
+     }
      barrier(CLK_LOCAL_MEM_FENCE);
    }
  }
}

Part of the work-items of a work-group might not enter the loop in line 3. As a result, the barriers inside this loop may only be reached by part of the work-items, leading to undefined behaviour.
Previous work on Lift suffered from similar limitations and implemented a mix of compilation time and runtime checks to report the issue to the user. For Shine, an additional imperative DPIA pass could be implemented to fix the code as illustrated above:

  • (-), buggy code that would be generated by both Lift and Shine.
  • (+), a potential fix where the ctt function rounds up a number to a multiple of the involved work-items.