moderngpu / moderngpu

Patterns and behaviors for GPU computing

Home Page:http://moderngpu.github.io/moderngpu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wrong tid in cta_launch with non power of two NT

christiankerl opened this issue · comments

using cta_launch with NT not being a power of two results in wrong tid values passed to the lambda. simple test case:

static const int NT = 96
mgpu::cta_launch<NT>([=] MGPU_DEVICE (const int tid, const int block) {
      printf("thread %d %d %d\n", threadIdx.x, tid, threadIdx.x & (NT - 1));
}, 1, ctx);

For NT = 32, 64 and 128 this works fine. However setting NT to any multiple of 32 should be valid, right?

That's a good point. I should definitely change that. Started off only supporting powers of two because I wasn't sure if all the algorithms I wrote would work on those odd CTA sizes. But I should relax that restriction for cta_launch and transform etc, and static_assert inside non-supported functions if I need to. Will roll that fix out in the next update. Thx.

Fixed in the 2.10 push. Let me know if you have problems with it.

works fine now, thanks for the fix