support long context llama 3 models
bachittle opened this issue · comments
Any plans to support a model like this? https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k
Since traditional llama 3 only has 8000 token context window limit, wondering how feasible it would be to run these models?
This model is a fine-tune of Llama 3 so it is already supported (no change in architecture). They just changed the RoPE theta and trained with that.
Some people were having issues described here https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k/discussions/13 but I expect those to be the same issues the base Llama 3 models had until they were fixed.
You should have followed the enhancement template you were given. Did you try running the model? If so what happened?