There are 0 repository under kv-cache topic.
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Notes about LLaMA 2 model
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.
Fine-Tuned Mistral 7B Persian Large Language Model LLM / Persian Mistral 7B
This a minimal implementation of a GPT model but it has some advanced features such as temperature/ top-k/ top-p sampling, and KV Cache.