Giters
ggerganov
/
llama.cpp
LLM inference in C/C++
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
59903
Watchers:
508
Issues:
3172
Forks:
8530
ggerganov/llama.cpp Issues
convert-hf-to-gguf-update.py breaks
Updated
17 days ago
Comments count
15
build server success but execute `ggml_cuda_init: failed to initialize CUDA: unknown error`
Updated
18 days ago
Comments count
1
Looking for help for using llama.cpp with Phi3 model and LoRA
Updated
19 days ago
Comments count
6
Embedding fails to run on vulkan backend
Closed
22 days ago
Comments count
14
how can i modify the setting,make it answer in Chinese by default
Closed
21 days ago
Comments count
9
ggml-cuda.cu:1278: to_fp32_cuda != nullptr
Updated
21 days ago
Comments count
8
Native Intel IPEX-LLM Support
Updated
23 days ago
Comments count
11
convert-hf-to-gguf.py breaks on phi-2
Updated
24 days ago
Comments count
11
llamacpp --prompt-cache-all < -- more than a year passed and still is not fully implemented
Updated
24 days ago
Comments count
1
Should we add an autolabeler for PR?
Updated
25 days ago
Comments count
2
selects too many cores by default on orange pi 5 (2x slower)
Closed
25 days ago
Comments count
3
server and llama3: escaping escaped characters which leads to endless backslahes, newlines etc. in the textarea
Updated
a month ago
Comments count
5
Question about high perplexity about llama3-8b-hf in gguf format
Closed
a month ago
Comments count
6
Impact of bf16 on Llama 3 8B perplexity?
Updated
a month ago
Comments count
2
Is Infini-attention support possible?
Updated
a month ago
Comments count
1
Gibberish response from server and main exits on M1 macstudio ultra with gpu (cpu ok)
Closed
a month ago
Comments count
4
NKVO argument leads to huge compute buffers in full Cublas offload on a heterogeneous dual GPU config.
Closed
a month ago
Comments count
1
Build error at server.cpp: undefined reference to `json_schema_to_grammar
Closed
a month ago
Comments count
8
How to quantize fine-tuned llama3-8b the right way ?
Closed
a month ago
Comments count
10
Add metadata override and also generate dynamic default filename when converting gguf
Closed
a month ago
Comments count
1
BF16 prompt processing has half the performance compared to F16 and F32 von AMD Ryzen Embedded V3000 (Zen 3)
Updated
a month ago
Comments count
1
Assertion failure on quantization of Meta-Llama-3-70B-Instruct from f16 to various quantization types.
Closed
a month ago
Comments count
9
Make -DLLAMA_HIP_UMA a dynamic setting.
Updated
a month ago
How to make the examples?
Updated
a month ago
Comments count
1
Abort in example server (/completions route) given string-type system_prompt
Closed
a month ago
Comments count
3
Server: completion_probabilities (tok_str and prob) seem to be broken
Closed
a month ago
Comments count
8
quantize: command not found
Closed
a month ago
Comments count
2
Support for Consistency Large Language Models?
Updated
a month ago
Comments count
5
Huge difference in performance between llama.cpp and llama-cpp-python
Closed
a month ago
Comments count
1
Expanding Swift Package Functionality
Closed
a month ago
Comments count
2
LLaVA-NeXT-Video-34B
Updated
a month ago
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 149, got 147
Updated
a month ago
Can't run the program
Updated
a month ago
Comments count
8
Train For Language Translation
Updated
a month ago
Comments count
2
third party applications are overwhelmingly slow for subsequent prompt evaluation compared to examples/main and examples/server
Updated
a month ago
Comments count
2
Messy CUDA graph error output on mixtral/MoE models
Closed
a month ago
Comments count
6
[SYCL] Implement Flash attention.
Updated
a month ago
Comments count
2
ggml-cuda.so is 90mb with -arch=all
Updated
a month ago
Comments count
1
Add support for mistral Dutch and Armenian models: Tweeties/tweety-7b-dutch-v24a and Tweeties/tweety-7b-armenian-v24a
Updated
a month ago
llama : make vocabs LFS objects?
Updated
a month ago
Comments count
6
[Server] JSON outputs are not being enforced according to the JSON Schema.
Closed
a month ago
Comments count
3
Segmentation fault in example server (/v1/chat/completions route) given incorrect JSON payload
Closed
a month ago
Comments count
10
error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s8_x2’?
Updated
a month ago
Server 'penalize_nl' parameter defaults to False?
Updated
a month ago
Comments count
2
Is it extending pre trained model or finetuning the pretrained model?
Updated
a month ago
Original index.js and index.js.hpp file source?
Closed
a month ago
Comments count
2
common/build-info.cpp not properly updated
Updated
a month ago
(Server) Strange behavior with JSON Schema with Llama 3
Closed
a month ago
Comments count
4
could we get Aryanne/Calypso-3B-alpha-v2-gguf added to the demo?
Updated
a month ago
Comments count
2
Minor improvement in cmake script for msvc/clang-cl
Updated
a month ago
Previous
Next