llama.web

A simple inference web UI for llama.cpp / lama-cpp-python

What it is

This web frontend is intended to run inferences against quantized ggml language models. It's very simple and intended to run locally without authentication or authorization for administrative activities. Use at own risk.

How it works

Get a model from huggingface and convert it appropriately
Install Python packages with pip install -r requirements.txt
Modify /app/configuration/default.py to your needs
Set Enviornment variables if needed (see below)
go to the app folder and run make start
Open Web-Browser and navigate to http://localhost:8123

Environment variables

Name	Purpose
CONFIGURATION	Specifies which configuration file from the configuration folder will be loaded (default.py if not set)
MODEL_FOLDER	Path to your LLMs. By default it will use "models" in the root folder of the project
WS_URL	external URL for websocket connection. Will be rendered into the HTML/Javascript. Default ws://localhost. Overwrite if running behind a reverse proxy

Builtin Chat Commands

As I am to lazy to build a sophisticated UI some options can only be accessed by chat commands. Type in !help to get a list of available commands

Command	Purpose
!models	List available models. Click a model to load it. Due to RAM constraints changing the model will apply for all current connections
!model	Show currently loaded model
!model (filename)	Load a different model
!stop	List of currenlty set stop words
!stop ['word1',...]	Assign new stopwords. Format Stopwords as Python/JSON Array
!system	System State (used/free CPU and RAM)

Tip: Server commands (and chat messages alike) can be sent by either pressing the "Ask the LLaMa" button or pressing ctrl + enter

Quick Prompt Templates

The web comes with three pre-defined prompt templates which can be auto-completed via a specific shortcut text and either pressing tab or ctrl + enter

Shortcut	Description
#vic	Helpful AI Vicuna 1.1 prompt template
#story	Storyteller Vicuna 1.1 prompt template
###	Instruct/Response prompt template

You can define own templates in your configuration file:

PROMPT_TEMPLATES = [
    ["vic",   "You are a helpful AI assistant.\\n\\nUSER: \\n\\nASSISTANT:", 39],
    ["##",    "\\n\\n### RESPONSE:", 0],
    ["story", "You are a storyteller. Your writing is vivid, exentive and very detailed. Extract the character traits from the user's input but don't name them in your story directly. Instead weave them into the story.\\n\\nUSER: Write a story about \\n\\nASSISTANT:",  231]
]

Each Template consists of an arry with three items:

The chat shortcut (without the leading #)
The Template. Note: Since this is python code that is rendered into javascript code, escaped characters need to have their escape prefix escaped too (ie \n -> \\n)
Location of the curor inside the template after auto completion

About

A simple inference web UI for llama.cpp / lama-cpp-python

MIT License

Languages

Language:Python 40.3%Language:JavaScript 30.5%Language:CSS 18.0%Language:HTML 8.9%Language:Dockerfile 1.2%Language:Shell 0.7%Language:Makefile 0.4%