For BinaryBERT, Is 2 epochs used to fine tune both the ternary BERT and binary BERT with data augmentation?

Question

For BinaryBERT, Is 2 epochs used to fine tune both the ternary BERT and binary BERT with data augmentation?

Phuoc-Hoan-Le opened this issue a year ago · comments

Phuoc-Hoan Charles Le commented a year ago

For BinaryBERT, is 2 epochs used to fine tune both the ternary BERT and binary BERT with data augmentation? In the paper it says it uses 1 epoch for both, but however when taking look at the shell script, the number of epochs is set to 2 with two-stage distillation being used for both of them?