Web-LLM

Question

Web-LLM

pacoccino opened this issue 6 months ago · comments

Pacien B commented 6 months ago

What about using Web-LLM instead of running an ollama server ?

https://github.com/mlc-ai/web-llm

Runs models in the browser via WASM/WebGPU

Jacob Lee · Answer 1 · Tue Mar 12 2024 00:59:32 GMT+0800 (China Standard Time)

Would be sweet! I tried with it initially but had some technical issues around caching - plus initial load time was pretty rough.

Would love to revisit though now especially now that stuff like Gemma is making waves.

Pacien B · Answer 2 · Tue Mar 12 2024 01:23:53 GMT+0800 (China Standard Time)

I've done a small experiment creating a chrome extension that hosts and run local models: https://github.com/pacoccino/ai-mask

What issues did you face with web-llm ? I'll try forking your app and make it work with the extension

Jacob Lee · Answer 3 · Tue Mar 12 2024 15:28:00 GMT+0800 (China Standard Time)

Oh neat!! Had intended to try the same actually. Will check it out.

It was ~8 months ago now but it had to do with lack of caching/docs on how to set it up. Every time I refreshed the app it would redownload a model until my computer ran out of memory.

Pacien B · Answer 4 · Tue Mar 12 2024 16:04:21 GMT+0800 (China Standard Time)

I've drafted a PR just to show it works: #16

It works great through I got some technical issues with the web worker. Caching works and the extension store the models once for any app that needs them !

AI-mask is an experiment, I'd like to know what you think about it and if it could be interesting to push it forward 🤔

Jacob Lee · Answer 5 · Mon Mar 18 2024 06:11:35 GMT+0800 (China Standard Time)

Added separately! Thank you for the issue!

Re: AI-mask - I have been meaning to try building something similar myself for a long time now, and I think WebLLM is getting good enough where it's useful.

My thought would be to basically expose the equivalent of a LangServe endpoint in the Chrome extension:

https://github.com/langchain-ai/langserve

So then a web developer could use a remote runnable to build chains with the familiar invoke/batch/stream/streamLog API in LangChain.js:

https://js.langchain.com/docs/ecosystem/langserve

I don't expect you to add this to AI-mask but would definitely encourage you to keep it up! I think there's really something there.

Pacien B · Answer 6 · Fri Mar 22 2024 19:01:59 GMT+0800 (China Standard Time)

I've open a new PR #19 with better support for AI-Mask.

@jacoblee93 About your thought exposing an equivalent of Langserve endpoint from a chrome extension, could you elaborate ? What would be the difference for a web dev between using a Remote Runnable and a lib like ChatAIMask I'm using now ?

Jacob Lee · Answer 7 · Sat Mar 23 2024 00:42:13 GMT+0800 (China Standard Time)

Ah cool! Will try to take a look this weekend.

In general, it'd allow for some common operations we've seen folks want when building with LangChain. But actually now that I think about it the only important one beyond simple .invoke is .stream since this would just make the model available!

By implementing the web client to an extension (like I think you've done with ChatAIMask?) as a runnable you'd basically be able to swap out e.g. ChatOpenAI for ChatAIMask and build things like this completely locally:

https://js.langchain.com/docs/use_cases/question_answering/

But yeah I think you are right in that the full LangServe suite is not necessary. Looking forward to digging in!