pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too long input texts cuase device-side assert triggered

li-aolong opened this issue · comments

<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [58,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [59,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [60,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [61,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [62,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [63,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
...
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

When I use llama-2-7b-hf and input ten samples, each with a length of around 2048 after tokenization, I still encounter the above error, even though I have set the block size to 4096.

If I shorten the length, it works.

Any idea on how to use this for long context example? seems max_seq_len >2048 triggered above error