The question about Bi-Attention

Question

The question about Bi-Attention

TIEHua opened this issue 2 years ago · comments

Thanks for sharing the source code. I have a question about Bi-Attention structure.
When calculating the attention score * value, you binarize the attention score to 0 or 1 in the source code. In the paper, I noticed that you proposed a new bitwise operation to support computation between the binarized attention weight bool(A) and the binarized value during inference. But in the source code, I didn't find this part of the code.
Are the attention scores in the training and testing phases in the source code binarized to 0 and 1?

Haotong Qin · Answer 1 · Thu Oct 20 2022 23:55:25 GMT+0800 (China Standard Time)

Thanks for your attention! Please refer following links:
https://github.com/htqin/BiBERT/blob/main/transformer/modeling_quant.py#L176
https://github.com/htqin/BiBERT/blob/main/transformer/utils_quant.py#L28