Clarifications Needed on KVCache Compression and Matrix Operations in MLA KVCache
hxer7963 opened this issue · comments
In MLA, the KVCache compresses
However, according to equation (17):
Appendix B mentions that by applying the associative law of matrix multiplication,
Questions:
- Given that
$W^Q \in \mathbb{R}^{d_hn_h \times d}$ and$W^{UK} \in \mathbb{R}^{d_hn_h \times d_c}$ , how are these matrices multiplied to derive$W^{UQ}$ ? - How are the values for the matrices
$W^{DKV}, W^{UK}, W^{KR}$ computed? Appendix B suggests that these are calculated offline once and not during training as part of the low-rank matrix values.
Any insights or detailed explanations regarding these points would be highly appreciated.
Here's a recommended blog for you: https://spaces.ac.cn/archives/10091 @hxer7963