serving-infrastructure

There are 0 repository under serving-infrastructure topic.

ksm26 / Efficiently-Serving-LLMs
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
batch-processing deep-learning-techniques inference-optimization machine-learning-operations model-acceleration model-inference-service model-serving optimization-techniques performance-enhancement scalability-strategies server-optimization text-generation large-scale-deployment serving-infrastructure
Language:Jupyter Notebook 2