openai / triton

Development repository for the Triton language and compiler

Home Page:https://triton-lang.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to implement blockwise fill?

void-main opened this issue · comments

Hi team, I'm writing a kernel that has the following requirement:

I have a range tensor with values: [0, 10, 20, 30] and another range tensor with values: [0, 1, 2, 3, 4, 5, 6, 7], the expected tensor should be: [0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 33, 34, 35, 36, 37].

I've tried the following code:

row_idxes = tl.arange(0, 4)
col_idxes = tl.arange(0, 8)
block_offs = row_idxes[:, None] * 10 + col_idxes[None, :] * 1
block_offs = tl.view(block_offs, [BLOCK_N])

Logically the code works, but the tl.view operations seems to be not working as expected (like #2210 #2157 ), how could I fix this?