Further reduction of pod cold start time
yunfeng-scale opened this issue · comments
I believe we could do < 20s for llama 2 7b models.
Scale LLM Engine public repository
yunfeng-scale opened this issue · comments
I believe we could do < 20s for llama 2 7b models.