Rate Limit error on locally deployed model

Question

Rate Limit error on locally deployed model

V4G4X opened this issue 7 months ago · comments

I am trying to run Starcoder locally through Ollama. And I want to get code auto-completion like in the README gif.

But I keep getting the following error after every debounce: [LLM] inference api error: Rate limit reached. Please log in or use your apiToken

local llm = require('llm')

    llm.setup({
        api_token = nil,                             -- cf Install paragraph
        model = "bigcode/starcoder",                 -- the model ID, behavior depends on backend
        url = "http://localhost:11434/api/generate", -- the http url of the backend
        tokens_to_clear = { "<|endoftext|>" },       -- tokens to remove from the model's output

        -- parameters that are added to the request body, values are arbitrary, you can set any field:value pair here it will be passed as is to the backend
        request_body = {
            parameters = {
                temperature = 0.1,
            },
        },
        -- set this if the model supports fill in the middle
        fim = {
            enabled = true,
            prefix = "<fim_prefix>",
            middle = "<fim_middle>",
            suffix = "<fim_suffix>",
        },
        debounce_ms = 1000,
        context_window = 8192, -- max number of tokens for the context window
        tokenizer = { -- cf Tokenizer paragraph
            repository = "bigcode/starcoder",
        },
    })

Am I wrong to understand that this repo can give Copilot/Tabnine-like autocomplete with locally deployed models?
Please let me know what my next steps should be.

V4G4X · Answer 1 · Sun Feb 25 2024 04:14:36 GMT+0800 (China Standard Time)

If it helps, this is how I've installed the plugin in Lazy.

{ 'huggingface/llm.nvim',   opts = {}, },        -- Auto Generation using Open LLMs

Luc Georges · Answer 2 · Wed May 29 2024 22:38:06 GMT+0800 (China Standard Time)

Hi @V4G4X, sorry for the very late reply. You have to set backend = ollama for this to work.