Local LLama Chrome Extension

What is this?

This is a chrome extension that allows you to query the llama-cpp-python models while in the browser. It uses a local server to handle the queries and display the results in a popup.

Showcase

Prerequisites

llama-cpp-python must be installed and some models must be downloaded. See llama-cpp-python for more information. Models available for download from huggingface:

TheBlokeAI i'v been using:
TheBlokeAI/Llama-2-7B for my testing but most of the gguf models should work. obviously the bigger the model the slower the query. and the more ram it will use.

How to use it?

Clone this repo
Open Chrome and go to chrome://extensions/
Enable developer mode
Click on Load unpacked and select the folder where you cloned this repo
Go to any page and click on the extension icon
start the server with python3 server.py
Type in the query and press enter
The results will be displayed in the popup

TODO

add a server to handle the queries
add a popup to display the results
store and retrieve conversations
clear saved conversations
add a settings page
add a way to change the server address
add a way to change the model easily
add a way to download models from huggingface
add a way to start the server from the extension

About

A chrome extention for quering a local llm model using llama-cpp-python, includes a server.py to run along side the extention

Languages

Language:JavaScript 33.5%Language:CSS 33.2%Language:Python 22.7%Language:HTML 10.6%