[Feature Request] Support for local LLMs like Ollama

Question

[Feature Request] Support for local LLMs like Ollama

mmmeff opened this issue 9 months ago · comments

Matt commented 9 months ago

title

Tyler Weir · Answer 1 · Wed Oct 18 2023 04:02:27 GMT+0800 (China Standard Time)

The ability to use local LLMs would be great.

Charles Packer · Answer 2 · Thu Oct 19 2023 04:57:07 GMT+0800 (China Standard Time)

added to the roadmap!

Aaronminer1 · Answer 3 · Sat Oct 21 2023 10:53:17 GMT+0800 (China Standard Time)

using it withe LM studios local api server would be great spent half the day trying to get it to connect but alas no good results it should be possible since the server is suppose to be a drop in replacement for openai api https://lmstudio.ai/

Matt Olander · Answer 4 · Sat Oct 21 2023 12:06:33 GMT+0800 (China Standard Time)

LM Studio is interesting but in keeping with the spirit of open-source, a better solution would be https://github.com/go-skynet/LocalAI, a fully open drop-in OpenAI API replacement that includes support for functions. I am cloning memGPT now and have a localAI installation so perhaps I can see this weekend what would be required.

Giulio Gatto · Answer 5 · Sat Oct 21 2023 15:36:27 GMT+0800 (China Standard Time)

Support for local LLMs would be a game changer, in particular being able to use Mistral 7B

jackfood · Answer 6 · Sat Oct 21 2023 21:27:11 GMT+0800 (China Standard Time)

using it withe LM studios local api server would be great spent half the day trying to get it to connect but alas no good results it should be possible since the server is suppose to be a drop in replacement for openai api https://lmstudio.ai/

Any luck on running with LM Stuido?

cronobjs · Answer 7 · Sat Oct 21 2023 21:53:02 GMT+0800 (China Standard Time)

using it withe LM studios local api server would be great spent half the day trying to get it to connect but alas no good results it should be possible since the server is suppose to be a drop in replacement for openai api https://lmstudio.ai/

Any luck on running with LM Stuido?

I have LM Studio And Im trying to figure this out but its so confusing. If anyone out there has able to get any Llama versions to run with MemGPT that would be Helpful.

Ty · Answer 8 · Sat Oct 21 2023 21:55:53 GMT+0800 (China Standard Time)

added to the roadmap!

Thank you!

Charles Packer · Answer 9 · Sun Oct 22 2023 03:57:33 GMT+0800 (China Standard Time)

We are actively working on this (allowing pointing MemGPT at your own hosted LLM backend that supports function calling), more updates to come soon.

d0rc · Answer 10 · Sun Oct 22 2023 04:18:12 GMT+0800 (China Standard Time)

sorry, for asking obvious questions - isn't it possible to just start local OpenAI API, following llama.cpp-python bindings documentation, or bringing it up with LMStudio and override OPENAI_API_ENDPOINT environment variable or something like that?

Jon Snow · Answer 11 · Sun Oct 22 2023 05:37:09 GMT+0800 (China Standard Time)

We are actively working on this (allowing pointing MemGPT at your own hosted LLM backend that supports function calling), more updates to come soon.

Can't wait to see what comes of your proposed Mistral 7B fine-tune, I hope it's intended use is to allow for system ai interdependence and release constrait to external OpenAI processing... I imagine a model that could call a subject matter expert model into vram for specific questioning, or just be able to conduct web research and put together reports of their own accord. Organize the data into its own fine tune safensor or lora depending on your AI core update interval... the future is coming.

d0rc · Answer 12 · Sun Oct 22 2023 06:31:01 GMT+0800 (China Standard Time)

Ok, so the idea is not fine-tune some model to be more aligned to call MemGPT functions? But where did you get that info? Please, share.

Charles Packer · Answer 13 · Sun Oct 22 2023 06:37:56 GMT+0800 (China Standard Time)

@d0rc

We are doing both:

Adding official support for using your own LLM backend that supports function calling (this can be as simple as setting the openai.api_base property to point towards your server if the backend if configured properly, but we want to add better support for this with examples and some reference models). This will also make is easier for the community to try new function calling LLMs with MemGPT (since new ones are getting released quite frequently) to see which work best.
Working on our own finetuned models that are finetuned specifically for MemGPT functions (with the idea that these should hopefully perform better than open models finetuned on general function call data, and thus help approach the performance of MemGPT+gpt4).

This issue is for tracking (1), and discussion for (2) is here: #67 (though the content of the two threads is overlapping).

Ishaan Jaff · Answer 14 · Sun Oct 22 2023 08:30:24 GMT+0800 (China Standard Time)

made a PR for this: #86

Matt Olander · Answer 15 · Sun Oct 22 2023 22:59:16 GMT+0800 (China Standard Time)

OPENAI_API_BASE=http://localhost:8080/v1 python main.py --persona syn.txt --model wizardcoder-python-34b.gguf
Running... [exit by typing '/exit']
Warning - you are running MemGPT with wizardcoder-python-34b.gguf, which is not officially supported (yet). Expect bugs!
💭 Bootup sequence complete. Persona activated. Testing messaging functionality.
Hit enter to begin (will request first MemGPT message)hello!
💭 None
🤖 Hello, Chad! I'm Synthia. How can I assist you today?
Hi Syn, I am Matt.
and so on...

Hahaha, fantastic! Yeah, using LocalAI (single docker command and I have models lying all over the place but if that wasn't the case, LocalAI project can pull them from HuggingFace or Model Gallery that they have setup automagically at runtime)

I started off a little rocky as I spent the majority of my time on FreeBSD getting memGPT going (I will file a pr if I cant get it) but moved to a Linux box to see some forward motion and to check if one can indeed just change the endpoint on a properly config'd backend and sail away. Yes, you sure can! My first try or 2, I didn't have large enough context window, typo'd my model template, etc., but once I stopped spazzing out, it fired right up and started working straight away. Yay! Nice project, kudos you guys and great paper btw. Congrats!

Here's a horrible first proof of life video before I chop it into an actual success video later:
http://demonix.io:9000/index.php?p=&view=memgpt-localai.mp4

Gary Blankenship · Answer 16 · Mon Oct 23 2023 00:05:30 GMT+0800 (China Standard Time)

Testing with LM Studio.

OPENAI_API_BASE=http://localhost:1234/v1 python3 main.py

[2023-10-22 11:50:02.528] [ERROR] Error: 'messages' array must only contain objects with a 'role' field that is either 'user', 'assistant', or 'system'.

jackfood · Answer 17 · Mon Oct 23 2023 08:54:03 GMT+0800 (China Standard Time)

OPENAI_API_BASE=http://localhost:8080/v1 python main.py --persona syn.txt --model wizardcoder-python-34b.gguf Running... [exit by typing '/exit'] Warning - you are running MemGPT with wizardcoder-python-34b.gguf, which is not officially supported (yet). Expect bugs! 💭 Bootup sequence complete. Persona activated. Testing messaging functionality. Hit enter to begin (will request first MemGPT message)hello! 💭 None 🤖 Hello, Chad! I'm Synthia. How can I assist you today? Hi Syn, I am Matt. and so on...

Hahaha, fantastic! Yeah, using LocalAI (single docker command and I have models lying all over the place but if that wasn't the case, LocalAI project can pull them from HuggingFace or Model Gallery that they have setup automagically at runtime)

I started off a little rocky as I spent the majority of my time on FreeBSD getting memGPT going (I will file a pr if I cant get it) but moved to a Linux box to see some forward motion and to check if one can indeed just change the endpoint on a properly config'd backend and sail away. Yes, you sure can! My first try or 2, I didn't have large enough context window, typo'd my model template, etc., but once I stopped spazzing out, it fired right up and started working straight away. Yay! Nice project, kudos you guys and great paper btw. Congrats!

Here's a horrible first proof of life video before I chop it into an actual success video later: http://demonix.io:9000/index.php?p=&view=memgpt-localai.mp4

I tried too, unable to get it right.
'OPENAI_API_BASE' is not recognized as an internal or external command,
operable program or batch file.

Any assistant on this will be great on using LM Studio.

Gary Blankenship · Answer 18 · Mon Oct 23 2023 09:16:37 GMT+0800 (China Standard Time)

I'm on Mac OS 14.0 Sonoma with an M2.

I was able to get the llama.cpp server working with

the llama.cpp/examples/server/api_like_OAI.py file
the llama.cpp/server file

The problem I ran into was I didn't find a model that supported function calling yet.

Some of the steps I took are:

export OPENAI_API_KEY=123456
export OPENAI_REVERSE_PROXY=http://127.0.0.1:8081/v1/chat/completions (maybe?)
python api_like_OAI.py --api-key 123456 --host 127.0.0.1 --user-name "user" --system-name "assistant"
./server -c 4000 --host 0.0.0.0 -t 12 -ngl 1 -m models/airoboros-l2-13b-3.1.1.Q4_K_M.gguf --embedding --alias gpt-3.5-turbo -v

Gary Blankenship · Answer 19 · Mon Oct 23 2023 09:21:21 GMT+0800 (China Standard Time)

I tried too, unable to get it right. 'OPENAI_API_BASE' is not recognized as an internal or external command, operable program or batch file.

Any assistant on this will be great on using LM Studio.

Just a note to say the OPENAI_API_BASE=host:port is just a way to set an environment variable when you run the python command. MemGPT must check for it and swap the api base url.