bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is this assertion for mask wrong?

yinfangchen opened this issue · comments

I got an AssertionError: Mask is silently ignored due to the use of a custom kernel when training GPT-2 with examples/pretrain_gpt.sh.

This line leads to the assertion error:

assert mask is None, "Mask is silently ignored due to the use of a custom kernel"

Is this assertion necessary? And is it even correct?

same puzzlement