Rate Limit error on locally deployed model
V4G4X opened this issue · comments
V4G4X commented
I am trying to run Starcoder locally through Ollama. And I want to get code auto-completion like in the README gif.
But I keep getting the following error after every debounce: [LLM] inference api error: Rate limit reached. Please log in or use your apiToken
local llm = require('llm')
llm.setup({
api_token = nil, -- cf Install paragraph
model = "bigcode/starcoder", -- the model ID, behavior depends on backend
url = "http://localhost:11434/api/generate", -- the http url of the backend
tokens_to_clear = { "<|endoftext|>" }, -- tokens to remove from the model's output
-- parameters that are added to the request body, values are arbitrary, you can set any field:value pair here it will be passed as is to the backend
request_body = {
parameters = {
temperature = 0.1,
},
},
-- set this if the model supports fill in the middle
fim = {
enabled = true,
prefix = "<fim_prefix>",
middle = "<fim_middle>",
suffix = "<fim_suffix>",
},
debounce_ms = 1000,
context_window = 8192, -- max number of tokens for the context window
tokenizer = { -- cf Tokenizer paragraph
repository = "bigcode/starcoder",
},
})
Am I wrong to understand that this repo can give Copilot/Tabnine-like autocomplete with locally deployed models?
Please let me know what my next steps should be.
V4G4X commented
If it helps, this is how I've installed the plugin in Lazy.
{ 'huggingface/llm.nvim', opts = {}, }, -- Auto Generation using Open LLMs
Luc Georges commented
Hi @V4G4X, sorry for the very late reply. You have to set backend = ollama
for this to work.