enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

Home Page:https://big-agi.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] ollama models context size not properly imported/reflected

XReyRobert opened this issue · comments

commented

Describe the bug
ollama models context size not properly imported/reflected

Where is it happening?
To Reproduce
import 128K ollama model (ex Yarn-mistral 7b-128k) show model details / max model tokens in UI

Expected behavior

Screenshots / context

If applicable, please add screenshots or additional context

Capture d’écran 2023-12-27 à 15 01 21

Thanks @XReyRobert . Unfortunately Ollama does not usually provide the context size, so it's assumed to be 4k across the board.

The /models API does not provide it, and the models list did not.

In your particular case, the name of the model has the context size, but that's a rarity.

What's the best way to deal with this, or to get context sizes for all models?

commented

Hi @enricoros,

There's a "show" endpoint that gives additional parameters when available:
for example mistrallite:latest and yarn-mistral:7b-128k will display this "num_ctx" parameter.

curl http://localhost:11434/api/show -d '{
  "name": "mistrallite:latest"
}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   992  100   958  100    34   656k  23876 --:--:-- --:--:-- --:--:--  968k
{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM mistrallite:latest\n\nFROM /usr/share/ollama/.ollama/models/blobs/sha256:fcfc737faf6b2bb5050752602ca341e92ec4d8208f2b5762bd656d447be9910e\nTEMPLATE \"\"\"<|prompter|>{{ .System }} {{ .Prompt }}</s><|assistant|>\n\"\"\"\nPARAMETER num_ctx 32768\nPARAMETER stop \"<|prompter|>\"\nPARAMETER stop \"<|assistant|>\"\nPARAMETER stop \"</s>\"",
  "parameters": "num_ctx                        32768\nstop                           <|prompter|>\nstop                           <|assistant|>\nstop                           </s>",
  "template": "<|prompter|>{{ .System }} {{ .Prompt }}</s><|assistant|>\n",
  "details": {
    "format": "gguf",
    "family": "llama",
    "families": null,
    "parameter_size": "7B",
    "quantization_level": "Q4_0"
  }
}
curl http://localhost:11434/api/show -d '{
  "name": "yarn-mistral:7b-128k"
}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   568  100   532  100    36   423k  29315 --:--:-- --:--:-- --:--:--  554k
{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM yarn-mistral:7b-128k\n\nFROM /usr/share/ollama/.ollama/models/blobs/sha256:14f2e225961b80d791d14c88def05fca31abc44ab1a7a12ba8e8f2365442e6e6\nTEMPLATE \"\"\"{{ .Prompt }}\"\"\"\nPARAMETER num_ctx 131072",
  "parameters": "num_ctx                        131072",
  "template": "{{ .Prompt }}",
  "details": {
    "format": "gguf",
    "family": "llama",
    "families": null,
    "parameter_size": "7B",
    "quantization_level": "Q4_0"
  }
}

I confirm the bug. Also, for what it's worth, this Ollama release changelog specifies how to pass a 32k context window to Mixtral (and I suppose other models as well). https://github.com/jmorganca/ollama/releases/tag/v0.1.19

I confirm the bug. Also, for what it's worth, this Ollama release changelog specifies how to pass a 32k context window to Mixtral (and I suppose other models as well). https://github.com/jmorganca/ollama/releases/tag/v0.1.19

Thanks! I'll prioritize this issue. I can quickly fix it as far as knowing the context size.

For the "32k Mixtral" the weird part is that it should not be the developer to tell the API what the context window is, but the other way around. Commonly, APIs usually pass a "max_tokens" parameter as a hard limit to the response length - I'm sure the Ollama folks will make the API more standard. Their recent /chat endpoint shows that they're on a good path.

Prioritized.

@XReyRobert implemented, releasing in 3 hours in 1.12.0. Context size is inferred from num_ctx where available and set correctly. Please refer to Ollama / Jeffrey's post (https://github.com/jmorganca/ollama/releases/tag/v0.1.19) to alter that on your Ollama files.