There are 9 repositories under llama-cpp topic.
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
Build and run AI agents using Docker Compose. A collection of ready-to-use examples for orchestrating open-source LLMs, tools, and agent runtimes.
Self-evaluating interview for AI coders
LLama.cpp rust bindings
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.
Local ML voice chat using high-end models.
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Making offline AI models accessible to all types of edge devices.
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
Your customized AI assistant - Personal assistants on any hardware! With llama.cpp, whisper.cpp, ggml, LLaMA-v2.
High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.
Langport is a language model inference service
BabyAGI-🦙: Enhanced for Llama models (running 100% local) and persistent memory, with smart internet search based on BabyCatAGI and document embedding in langchain based on privateGPT
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
Local LLMs in your DAW!
A C++ implementation of Open Interpreter. / Open Interpreter 的 C++ 实现
Run llama.cpp in a GPU accelerated Docker container
llama.cpp-gfx906
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
◉ Universal Intelligence: AI made simple.