salesforce / CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mismatch in attention weights for causal masked tokens vs attention masked tokens

LakshyAAAgrawal opened this issue · comments

attention scores corresponding to the tokens that are masked out using attention_mask get a value of -1e4 as per https://github.com/salesforce/CodeGen/blob/main/jaxformer/hf/codegen/modeling_codegen.py#L439, whereas the attention scores masked out using causal_mask get a value of -1*1e9. This leads to a discrepancy between the pre-softmax attention scores for causally masked tokens and padded tokens to be different. This causes inference outputs from individual sequences to inference outputs to be different.