Web-LLM
pacoccino opened this issue · comments
What about using Web-LLM instead of running an ollama server ?
https://github.com/mlc-ai/web-llm
Runs models in the browser via WASM/WebGPU
Would be sweet! I tried with it initially but had some technical issues around caching - plus initial load time was pretty rough.
Would love to revisit though now especially now that stuff like Gemma is making waves.
I've done a small experiment creating a chrome extension that hosts and run local models: https://github.com/pacoccino/ai-mask
What issues did you face with web-llm ? I'll try forking your app and make it work with the extension
Oh neat!! Had intended to try the same actually. Will check it out.
It was ~8 months ago now but it had to do with lack of caching/docs on how to set it up. Every time I refreshed the app it would redownload a model until my computer ran out of memory.
I've drafted a PR just to show it works: #16
It works great through I got some technical issues with the web worker. Caching works and the extension store the models once for any app that needs them !
AI-mask is an experiment, I'd like to know what you think about it and if it could be interesting to push it forward 🤔
Added separately! Thank you for the issue!
Re: AI-mask - I have been meaning to try building something similar myself for a long time now, and I think WebLLM is getting good enough where it's useful.
My thought would be to basically expose the equivalent of a LangServe endpoint in the Chrome extension:
https://github.com/langchain-ai/langserve
So then a web developer could use a remote runnable to build chains with the familiar invoke/batch/stream/streamLog
API in LangChain.js:
https://js.langchain.com/docs/ecosystem/langserve
I don't expect you to add this to AI-mask but would definitely encourage you to keep it up! I think there's really something there.
I've open a new PR #19 with better support for AI-Mask.
@jacoblee93 About your thought exposing an equivalent of Langserve endpoint from a chrome extension, could you elaborate ? What would be the difference for a web dev between using a Remote Runnable and a lib like ChatAIMask I'm using now ?
Ah cool! Will try to take a look this weekend.
In general, it'd allow for some common operations we've seen folks want when building with LangChain. But actually now that I think about it the only important one beyond simple .invoke
is .stream
since this would just make the model available!
By implementing the web client to an extension (like I think you've done with ChatAIMask
?) as a runnable you'd basically be able to swap out e.g. ChatOpenAI
for ChatAIMask
and build things like this completely locally:
https://js.langchain.com/docs/use_cases/question_answering/
But yeah I think you are right in that the full LangServe suite is not necessary. Looking forward to digging in!