For BinaryBERT, Is 2 epochs used to fine tune both the ternary BERT and binary BERT with data augmentation?
Phuoc-Hoan-Le opened this issue · comments
Phuoc-Hoan Charles Le commented
For BinaryBERT, is 2 epochs used to fine tune both the ternary BERT and binary BERT with data augmentation? In the paper it says it uses 1 epoch for both, but however when taking look at the shell script, the number of epochs is set to 2 with two-stage distillation being used for both of them?