ggerganov / llama.cpp

LLM inference in C/C++

ggerganov/llama.cpp Issues

convert-hf-to-gguf-update.py breaks
Updated 17 days ago15
build server success but execute `ggml_cuda_init: failed to initialize CUDA: unknown error`
Updated 18 days ago1
Looking for help for using llama.cpp with Phi3 model and LoRA
Updated 19 days ago6
Embedding fails to run on vulkan backend
Closed 22 days ago14
how can i modify the setting,make it answer in Chinese by default
Closed 21 days ago9
ggml-cuda.cu:1278: to_fp32_cuda != nullptr
Updated 21 days ago8
Native Intel IPEX-LLM Support
Updated 23 days ago11
convert-hf-to-gguf.py breaks on phi-2
Updated 24 days ago11
llamacpp --prompt-cache-all < -- more than a year passed and still is not fully implemented
Updated 24 days ago1
Should we add an autolabeler for PR?
Updated 25 days ago2
selects too many cores by default on orange pi 5 (2x slower)
Closed 25 days ago3
server and llama3: escaping escaped characters which leads to endless backslahes, newlines etc. in the textarea
Updated a month ago5
Question about high perplexity about llama3-8b-hf in gguf format
Closed a month ago6
Impact of bf16 on Llama 3 8B perplexity?
Updated a month ago2
Is Infini-attention support possible?
Updated a month ago1
Gibberish response from server and main exits on M1 macstudio ultra with gpu (cpu ok)
Closed a month ago4
NKVO argument leads to huge compute buffers in full Cublas offload on a heterogeneous dual GPU config.
Closed a month ago1
Build error at server.cpp: undefined reference to `json_schema_to_grammar
Closed a month ago8
How to quantize fine-tuned llama3-8b the right way ?
Closed a month ago10
Add metadata override and also generate dynamic default filename when converting gguf
Closed a month ago1
BF16 prompt processing has half the performance compared to F16 and F32 von AMD Ryzen Embedded V3000 (Zen 3)
Updated a month ago1
Assertion failure on quantization of Meta-Llama-3-70B-Instruct from f16 to various quantization types.
Closed a month ago9
Make -DLLAMA_HIP_UMA a dynamic setting.
Updated a month ago
How to make the examples?
Updated a month ago1
Abort in example server (/completions route) given string-type system_prompt
Closed a month ago3
Server: completion_probabilities (tok_str and prob) seem to be broken
Closed a month ago8
quantize: command not found
Closed a month ago2
Support for Consistency Large Language Models?
Updated a month ago5
Huge difference in performance between llama.cpp and llama-cpp-python
Closed a month ago1
Expanding Swift Package Functionality
Closed a month ago2
LLaVA-NeXT-Video-34B
Updated a month ago
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 149, got 147
Updated a month ago
Can't run the program
Updated a month ago8
Train For Language Translation
Updated a month ago2
third party applications are overwhelmingly slow for subsequent prompt evaluation compared to examples/main and examples/server
Updated a month ago2
Messy CUDA graph error output on mixtral/MoE models
Closed a month ago6
[SYCL] Implement Flash attention.
Updated a month ago2
ggml-cuda.so is 90mb with -arch=all
Updated a month ago1
Add support for mistral Dutch and Armenian models: Tweeties/tweety-7b-dutch-v24a and Tweeties/tweety-7b-armenian-v24a
Updated a month ago
llama : make vocabs LFS objects?
Updated a month ago6
[Server] JSON outputs are not being enforced according to the JSON Schema.
Closed a month ago3
Segmentation fault in example server (/v1/chat/completions route) given incorrect JSON payload
Closed a month ago10
error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s8_x2’?
Updated a month ago
Server 'penalize_nl' parameter defaults to False?
Updated a month ago2
Is it extending pre trained model or finetuning the pretrained model?
Updated a month ago
Original index.js and index.js.hpp file source?
Closed a month ago2
common/build-info.cpp not properly updated
Updated a month ago
(Server) Strange behavior with JSON Schema with Llama 3
Closed a month ago4
could we get Aryanne/Calypso-3B-alpha-v2-gguf added to the demo?
Updated a month ago2
Minor improvement in cmake script for msvc/clang-cl
Updated a month ago