CodeAKrome / local-llama-chrome-extension

A chrome extention for quering a local llm model using llama-cpp-python, includes a server.py to run along side the extention

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Local LLama Chrome Extension

What is this?

This is a chrome extension that allows you to query the llama-cpp-python models while in the browser. It uses a local server to handle the queries and display the results in a popup.

Showcase

showcase

Prerequisites

llama-cpp-python must be installed and some models must be downloaded. See llama-cpp-python for more information. Models available for download from huggingface:

  • TheBlokeAI i'v been using:
  • TheBlokeAI/Llama-2-7B for my testing but most of the gguf models should work. obviously the bigger the model the slower the query. and the more ram it will use.

How to use it?

  1. Clone this repo
  2. Open Chrome and go to chrome://extensions/
  3. Enable developer mode
  4. Click on Load unpacked and select the folder where you cloned this repo
  5. Go to any page and click on the extension icon
  6. start the server with python3 server.py
  7. Type in the query and press enter
  8. The results will be displayed in the popup

TODO

  • add a server to handle the queries
  • add a popup to display the results
  • store and retrieve conversations
  • clear saved conversations
  • add a settings page
  • add a way to change the server address
  • add a way to change the model easily
  • add a way to download models from huggingface
  • add a way to start the server from the extension

About

A chrome extention for quering a local llm model using llama-cpp-python, includes a server.py to run along side the extention


Languages

Language:JavaScript 33.5%Language:CSS 33.2%Language:Python 22.7%Language:HTML 10.6%