pooler_num_attention_heads 문의

Question

pooler_num_attention_heads 문의

yonghee12 opened this issue 4 years ago · comments

안녕하세요, 모델과 하이퍼파라미터 공유해주셔서 감사합니다.
해당 내용을 참고하여 학습을 진행하려고 하는데, 아래 두 항목은 Huggingface의 BertConfig 문서에는 보이지 않아 질문드리고자 합니다.

"pooler_size_per_head": 128,
"pooler_num_attention_heads": 12,

BERT에서의 pooler는 Transformer Encoder의 output 이후에 보통 FC를 사용하여 downstream task에 맞게 projection 되는 것으로 이해했습니다.
하지만 올려주신 BertConfig에는 pooler가 multi head attention layer를 타는듯한 항목이 보이는데요, https://huggingface.co/transformers/model_doc/bert.html#transformers.BertConfig이나 다른 문서에서도 확인이 힘들어 질문드리게 되었습니다.

huggingface issue에도 huggingface/transformers#788 와 같이 유사한 질문이 올라왔던 것 같으나, 답변이 달리지 않았네요.

Junbum Lee · Answer 1 · Tue Oct 06 2020 10:57:44 GMT+0800 (China Standard Time)

안녕하세요,
해당 부분은 Huggingface Transformers에서는 사용하지 않지만,
Google BERT( https://github.com/google-research/bert )에서 사용하는 BERT Config에 필요한 옵션입니다.

학습을 위 구글버트로 진행해서 기본값을 넣어준 항목입니다 :)

Yonghee Cheon · Answer 2 · Tue Oct 06 2020 13:31:39 GMT+0800 (China Standard Time)

@Beomi

빠른 답변 감사합니다 :)

말씀주신 구글 공식 레포의 BERT Config를 살펴보았을 때에도 pooler에 대한 attention heads 옵션은 찾지 못하여 재차 질문을 드리게 되었습니다.
레포 내에서 attention으로 검색해도 Encoder 내부 self attention에 대한 attention heads 옵션만 보여지고, pool 관련은 [CLS] 토큰에 대한 Dense pooling만 보여집니다.
self.pooled_output = tf.layers.dense( first_token_tensor, config.hidden_size, activation=tf.tanh, kernel_initializer=create_initializer(config.initializer_range))

혹시 구글 버트상에서 pooler_num_attention_heads 기본값이 있는 부분을 알고 계시다면 한 번 공유해주실 수 있을지 조심스럽게 여쭙고 싶습니다 :)

다시 한 번 친절한 답변에 감사드립니다.

Junbum Lee · Answer 3 · Mon Mar 15 2021 00:30:01 GMT+0800 (China Standard Time)

아직 이 이슈를 닫지 않았었네요.
이부분은 Google bert에서 finetune할때 사용하는 부분이 맞습니다 :)

KcBERT는 PyTorch huggingface로 convert한 형태로 제공하기 때문에 해당 부분은 없어도 동작하는 것이 정상입니다.