prompt sharing for faster page attn
mmoskal opened this issue · comments
Michał Moskal commented
Right now (validate this!) the paged attn kernel doesn't take advantage of the fact that a significant part of the prompt may be shared between many queries - probably the kernel could be modified to only read these prompt entries once.