support SPM mode for FIM prompts
erfanium opened this issue · comments
Erfan commented
from fim paper (https://arxiv.org/pdf/2207.14255.pdf) section 3.1: SPM mode can be used to reuse kv cache across completion requests.
SPM modes can enable further latency optimization (which is very important in case of code completion tools). is there any reason that startcoder models are using normal PSM mode?
Loubna Ben Allal commented
We train with both modes (50% PSM and 50% SPM), similarily to StarCoder (cf paper). So you can also try SPM mode for inference.
Erfan commented
Got it. thanks!