TinyBERT-6 w/o GD results
apple024 opened this issue · comments
ting commented
Hello, Thank you for sharing this nice work.
I am wondering how GD influences the performance on TinyBERT-6 which seems not mentioned in the paper. Is it possible for you to share the scores of TinyBERT-6 trained w/o GD on all GLUE tasks?
Thank you so much!