ggerganov / llama.cpp

LLM inference in C/C++

ggerganov/llama.cpp Issues

Feature Request: introduce Tool Call API in server mode
Updated 9 days ago
Bug: NikolayKozloff/madlad400-10b-mt-Q8_0-GGUF works with llama-cli but doesn't work with llama-server
Updated 9 days ago
Bug: GGML_ASSERT(llama_add_eos_token(model) != 1) failed llama-server critical error with flan-t5 models
Updated 9 days ago4
Feature Request: Support Falcon Mamba 7B
Updated 10 days ago1
Bug: Flan t5 xl conversion error (medium severity, has workaround)
Updated 10 days ago
Feature Request: Add support for EXAONE-3.0-7.8B-Instruct model
Updated 10 days ago3
Feature Request: Add split model support in gguf-py
Updated 10 days ago
Bug: llama-server with --system-prompt-file stops abruptly without any error
Closed 10 days ago2
Bug: Slow response times with llama.cpp llama-server
Updated 10 days ago7
Bug: Couldn't load GGUF file into Transformers
Updated 10 days ago
Bug: -DCMAKE_CUDA_ARCHITECTURES=52 on GTX 1660 Ti or RTX 3060 results in incorrect output
Updated 10 days ago2
Feature Request: UPX the growing binaries in packaging.
Updated 10 days ago3
Feature Request: Support AWS inferentia inf2 instances
Updated 10 days ago3
Bug: `llama-export-lora` fails merging a T5 model with its LoRA adapter
Closed 11 days ago4
cannot import name 'BaseVocab' from 'gguf'
Updated 11 days ago2
Bug: llama-server scales default context incorrectly for multiple slots
Updated 11 days ago
Bug: Quantizing a bog standard llama is failing - Error: Error quantizing: b"main: invalid nthread 'Q8_0' (stoi)\n"
Updated 11 days ago1
Bug: Unintended behavior in llama_decode_internal when cparams.embeddings is True and cparams.pooling_type is LLAMA_POOLING_TYPE_NONE
Closed 13 days ago4
Bug: M2 Mac Studio not using context shifting
Updated 11 days ago
Feature Request: RPC Cuda Build to link with cudart dlls
Closed 11 days ago2
Bug: llama-cli out "error: input is empty" and end
Closed 11 days ago2
Bug: Long sample times with --top-k 0
Closed 12 days ago2
Bug: BigLlama-3.1-681B-Instruct requires llama_model_max_nodes to return a higher value
Closed 12 days ago2
Llama-Quantize : Layers quantized in the wrong order, thus damaging the variable bits tensor quants scheme consistency.
Updated 12 days ago
Bug: Phi-3 mini 128k performance degradation with kv size > 8k (server)
Updated 12 days ago3
Feature Request: MiniCPM 2.6 model support?
Updated 12 days ago2
Bug: BF16 is very slow
Updated 13 days ago3
Bug: Speed regression from early this year
Updated 13 days ago7
Feature Request: Req to support Structured output and JSON schema to GBNF
Updated 13 days ago2
Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence)
Updated 14 days ago5
Bug: process crashes when creating a context on Metal
Closed 14 days ago2
Bug: uncached prompt is not used for penalty
Updated 14 days ago
Feature Request: echo=true in llama-server
Updated 14 days ago1
Bug: Adreno740 GPU device can't load model in Android system
Updated 14 days ago
Bug: Decoding special tokens in T5
Closed 14 days ago6
Feature Request: Ovis1.5-Gemma2-9B model support?
Updated 15 days ago1
Feature Request: add support to LLaVA OneVision
Updated 15 days ago
Bug: llama-cli.exe don't have option as doc describes (like --chat-template)
Closed 16 days ago5
Bug: When --parallel 4 is turned ON, the inferring result is apparently like fool .But when --parallel 4 is turned OFF everything is OK ?
Updated 15 days ago4
build ERROR: Failed building wheel for pyyaml
Updated 15 days ago1
Fail to build with Vulkan on macOS with make or cmake
Closed 15 days ago7
Bug: Kompute exits before loading model when offloading to GPU
Updated 15 days ago1
Feature Request: Support vulkan when building on Android
Updated 16 days ago
Bug: exception while rasing a another exception in convert_llama_ggml_to_gguf script
Updated 16 days ago
Bug: Quantized kv cache caused performance drop on Apple silicon
Updated 16 days ago3
Bug: Latest version of convert_hf_to_gguf not compatible with gguf 0.9.1 from pip
Updated 16 days ago
Bug: Update to "convert_hf_to_gguf.py"
Updated 16 days ago
Bug: Inference fails with "llama_get_logits_ith: invalid logits id 7, reason: no logits" in ollama
Updated 16 days ago1
Bug: KV cache load/save is slow
Updated 16 days ago1
Bug: 2 tests fail
Updated 16 days ago2