llm-edge / hal-9100

Edge full-stack LLM platform. Written in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[optimization] caching requests, etc.

louis030195 opened this issue · comments

https://github.com/zilliztech/GPTCache

GPTCache only cache the retrieval part

in assistants we could cache:

  • function calls
  • retrieval
  • actions
  • code interpreter

in redis for example

i mean there are thousands way to slash latency and cost, not very difficult problem