[One latest paper] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
wutaiqiang opened this issue · comments
Nice work!
One missing related work:
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
https://arxiv.org/abs/2404.02657
Great work!
Thanks. We have added it and will update paper in a future version :)
This paper is under review in COLM rather than accepted.
Also, this paper rethinks the FKL and RKL in logit-based distillation and proposes AKL.
One blog: https://zhuanlan.zhihu.com/p/690748958
Thanks for your effort~
Thanks for your reminders. We have corrected it.
We added this paper to the "Feature" and "Divergence and Similarity" categories.