This is a chrome extension that allows you to query the llama-cpp-python models while in the browser. It uses a local server to handle the queries and display the results in a popup.
llama-cpp-python must be installed and some models must be downloaded. See llama-cpp-python for more information. Models available for download from huggingface:
- TheBlokeAI i'v been using:
- TheBlokeAI/Llama-2-7B for my testing but most of the gguf models should work. obviously the bigger the model the slower the query. and the more ram it will use.
- Clone this repo
- Open Chrome and go to
chrome://extensions/
- Enable developer mode
- Click on
Load unpacked
and select the folder where you cloned this repo - Go to any page and click on the extension icon
- start the server with
python3 server.py
- Type in the query and press enter
- The results will be displayed in the popup
- add a server to handle the queries
- add a popup to display the results
- store and retrieve conversations
- clear saved conversations
- add a settings page
- add a way to change the server address
- add a way to change the model easily
- add a way to download models from huggingface
- add a way to start the server from the extension