jacoblee93 / fully-local-pdf-chatbot

Yes, it's another chat over documents implementation... but this one is entirely local!

Home Page:https://webml-demo.vercel.app

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web-LLM

pacoccino opened this issue · comments

What about using Web-LLM instead of running an ollama server ?

https://github.com/mlc-ai/web-llm

Runs models in the browser via WASM/WebGPU

Would be sweet! I tried with it initially but had some technical issues around caching - plus initial load time was pretty rough.

Would love to revisit though now especially now that stuff like Gemma is making waves.

I've done a small experiment creating a chrome extension that hosts and run local models: https://github.com/pacoccino/ai-mask

What issues did you face with web-llm ? I'll try forking your app and make it work with the extension

Oh neat!! Had intended to try the same actually. Will check it out.

It was ~8 months ago now but it had to do with lack of caching/docs on how to set it up. Every time I refreshed the app it would redownload a model until my computer ran out of memory.

I've drafted a PR just to show it works: #16

It works great through I got some technical issues with the web worker. Caching works and the extension store the models once for any app that needs them !

AI-mask is an experiment, I'd like to know what you think about it and if it could be interesting to push it forward 🤔

Added separately! Thank you for the issue!

Re: AI-mask - I have been meaning to try building something similar myself for a long time now, and I think WebLLM is getting good enough where it's useful.

My thought would be to basically expose the equivalent of a LangServe endpoint in the Chrome extension:

https://github.com/langchain-ai/langserve

So then a web developer could use a remote runnable to build chains with the familiar invoke/batch/stream/streamLog API in LangChain.js:

https://js.langchain.com/docs/ecosystem/langserve

I don't expect you to add this to AI-mask but would definitely encourage you to keep it up! I think there's really something there.

I've open a new PR #19 with better support for AI-Mask.

@jacoblee93 About your thought exposing an equivalent of Langserve endpoint from a chrome extension, could you elaborate ? What would be the difference for a web dev between using a Remote Runnable and a lib like ChatAIMask I'm using now ?

Ah cool! Will try to take a look this weekend.

In general, it'd allow for some common operations we've seen folks want when building with LangChain. But actually now that I think about it the only important one beyond simple .invoke is .stream since this would just make the model available!

By implementing the web client to an extension (like I think you've done with ChatAIMask?) as a runnable you'd basically be able to swap out e.g. ChatOpenAI for ChatAIMask and build things like this completely locally:

https://js.langchain.com/docs/use_cases/question_answering/

But yeah I think you are right in that the full LangServe suite is not necessary. Looking forward to digging in!