Giters
ggerganov
/
llama.cpp
LLM inference in C/C++
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
58441
Watchers:
505
Issues:
2993
Forks:
8286
ggerganov/llama.cpp Issues
EOT token incorrectly set for Mistral-v0.2 trained with added ChatML tokens
Updated
10 hours ago
Comments count
4
Improve and expand Wikipedia article about llama.cpp
Updated
10 hours ago
Comments count
2
Possible performance boost with 2-pass online softmax
Updated
10 hours ago
Comments count
1
ggml_validate_row_data finding nan value for IQ4_NL
Closed
10 hours ago
Comments count
2
Vulkan outputs gibberish using extended context with vram saturated
Updated
10 hours ago
Comments count
5
RPC issues and comments
Closed
10 hours ago
Comments count
8
[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only
Updated
10 hours ago
Description of "-t N" option for server is inaccurate
Updated
10 hours ago
convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works)
Updated
10 hours ago
Comments count
2
llama : save downloaded models to local cache
Updated
10 hours ago
Comments count
3
AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall
Updated
10 hours ago
Flash attention implementations do not handle case where value vectors have different dimension from query vectors
Updated
10 hours ago
Comments count
1
Pretokenizer not supported by conversion script
Closed
10 hours ago
Comments count
2
Metal (iOS): Compute function exceeds available temporary registers
Updated
a day ago
Comments count
2
Issues: Unable for multiuser prompt
Updated
a day ago
Comments count
1
Segmentation Fault on GPU
Updated
a day ago
Comments count
1
enable rpc for server
Closed
a day ago
Support Falcon2-11B
Closed
a day ago
Comments count
2
Embedding server crashes when used with langchain openai embeddings
Updated
a day ago
Comments count
3
llava surgery script for new llava-arch model from Intel
Updated
a day ago
Llama3-8b & Perplexity.exe Issue
Closed
a day ago
Comments count
6
GGML_ASSERT(n_embd_gqa == n_embd_k_gqa) fails in models where key vector dimension is different from value vector dimension
Updated
2 days ago
Add support for multilingual Viking models, please.
Updated
2 days ago
Comments count
1
support long context llama 3 models
Closed
2 days ago
Comments count
1
Support for IBM Granite models
Closed
2 days ago
Comments count
1
How to quantize fine-tune LLM into GGUF format
Updated
3 days ago
Comments count
2
relocation R_X86_64_32 against hidden symbol `__TMC_END__' can not be used when making a shared object
Updated
3 days ago
Support request - Google MADLAD400-10B
Updated
3 days ago
Comments count
2
Llama-3 Instruct tokenizer_config.json changes in relation to the currently fetched llama-bpe configs.
Updated
3 days ago
In my os, the @ symbol and spaces don’t play nicely in llama.cpp directory.
Closed
3 days ago
Comments count
4
Error when trying to convert a HF model which is a LORA PEFT fine tuned version of phi-128k
Closed
3 days ago
Comments count
2
Windows MSYS2 compilation error. [SOLVED]
Updated
3 days ago
Comments count
3
Does it make sense to optimize strlen in this function with for loops?
Updated
3 days ago
Comments count
4
Possible stopping issues and bad asterik tokenization? (GGUF related)
Closed
4 days ago
Comments count
3
Infinite update_slots issue on latest build (1265c67)
Updated
4 days ago
/embeddings endpoint sometimes does not return embedding
Updated
4 days ago
MPI issue on raspberry pi cluster
Updated
4 days ago
Comments count
3
ThunderKittens:a simple yet faster flashattention alternative
Updated
4 days ago
Comments count
1
repeatability problem with CUDA backend
Closed
4 days ago
Comments count
8
How to build the llamacpp's .so file separately and then pass it in the llama_cpp_python / wrapper libraries directly.
Updated
4 days ago
Comments count
1
Compilation error using HIP SDK on Windows
Updated
4 days ago
Comments count
1
Text Generation task
Closed
4 days ago
Comments count
2
Performance regression with CUDA after commit 9c67c277
Closed
5 days ago
Comments count
8
Token generation speed reduces after GPU offloading
Updated
5 days ago
while finetuning llama.cpp doesn't create .bin file...
Updated
5 days ago
Server api not functioning with frontends
Closed
5 days ago
Comments count
3
CMakeLists bug in BLAS
Closed
6 days ago
An error occurred while converting Sakura-14B-Qwen2beta-v0.10pre0 to gguf
Closed
6 days ago
--cache-type-k q8_0 crashes server.exe after a while
Closed
6 days ago
Comments count
3
bf16 GGUF fails with GGML_ASSERT on CUDA
Closed
6 days ago
Comments count
2
Previous
Next