Problem in bert
zetaodu opened this issue · comments
zetaodu commented
I find thop will not calculate the parameters in BertEmbedding and if I define two self_attention blocks in one layer, it will only calculate one.
Ivan Stepanov commented
Second self_attention block should also be used in forward method