[One latest paper] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

Question

[One latest paper] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

wutaiqiang opened this issue 4 months ago · comments

Nice work!
One missing related work:
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
https://arxiv.org/abs/2404.02657

Shawn Xu · Answer 1 · Thu Apr 11 2024 13:55:16 GMT+0800 (China Standard Time)

Great work!

Thanks. We have added it and will update paper in a future version :)

wutaiqiang · Answer 2 · Thu Apr 11 2024 15:43:13 GMT+0800 (China Standard Time)

This paper is under review in COLM rather than accepted.

Also, this paper rethinks the FKL and RKL in logit-based distillation and proposes AKL.

One blog: https://zhuanlan.zhihu.com/p/690748958

Thanks for your effort~

Shawn Xu · Answer 3 · Fri Apr 12 2024 10:56:48 GMT+0800 (China Standard Time)

Thanks for your reminders. We have corrected it.

We added this paper to the "Feature" and "Divergence and Similarity" categories.