ggerganov / llama.cpp

LLM inference in C/C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support long context llama 3 models

bachittle opened this issue · comments

Any plans to support a model like this? https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

Since traditional llama 3 only has 8000 token context window limit, wondering how feasible it would be to run these models?

This model is a fine-tune of Llama 3 so it is already supported (no change in architecture). They just changed the RoPE theta and trained with that.

Some people were having issues described here https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k/discussions/13 but I expect those to be the same issues the base Llama 3 models had until they were fixed.

You should have followed the enhancement template you were given. Did you try running the model? If so what happened?