This repository contains the code and models used in the pape Cost-Optimal Grouped-Query Attention for Long-Context LLMs.
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"
https://arxiv.org/abs/2503.09579
Repository from Github https://github.comthunlp/cost-optimal-gqa
This repository contains the code and models used in the pape Cost-Optimal Grouped-Query Attention for Long-Context LLMs.
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"
https://arxiv.org/abs/2503.09579