EricLBuehler / mistral.rs

Blazingly fast LLM inference.

EricLBuehler/mistral.rs Issues

Streamed inference not as smooth (fast?) as with e.g. Ollama - Llama 3.1
Updated 8 days ago23
Error: unsupported dtype BF16 for op matmul (Mistral-Large-Instruct-2407)
Updated 9 days ago4
How's the M1 performance compare with llama.cpp or ollama?
Updated 9 days ago1
unknown dtype for tensor (BF16?)
Updated 10 days ago2
Crash when running with `RUST_BACKTRACE=1`
Closed 10 days ago2
Enable multiple CPU from arguments
Updated 10 days ago2
Any plan about KV compression algorithm like SnapKV and PyramidKV?
Updated 11 days ago4
Compilation failure (mistralrs-paged-att)
Closed 11 days ago15
[feat] running the server from rust
Updated 12 days ago3
[Feature] Implementation of multi-gpu KV cache (RingAttention)
Updated 12 days ago15
Error in cuda flash attention when building cargo project
Closed 13 days ago5
WSL2 Docker error loading llama-3.1 gguf
Updated 14 days ago
cuda error not found
Updated 14 days ago2
Distributed inference and tensor parallelism plans
Updated 15 days ago2
Tensor parallel support for multi GPU
Updated 17 days ago10
Support for codestral mamba 2
Updated 18 days ago
Mistral instruction template not working correctly when loading from GGUF
Updated 19 days ago6
support for gguf embedding models
Updated 22 days ago1
Pre-built binary for linux fails to launch with "error while loading shared libraries: libssl.so.1.1"
Updated 24 days ago10
Mistral-Nemo-Instruct-2407 Q8_0 GGUF: Model failed with error: ShapeMismatchBinaryOp { lhs: [1, 26, 4096], rhs: [26, 32, 160], op: "reshape" }
Closed a month ago4
key must be a cuda tensor
Updated 25 days ago4
Cuda error when running Llama3 and Llama3.1
Closed a month ago12
Pre-built binary for macOS Silicon does not seem to use Metal / GPU
Updated a month ago5
note: LINK : fatal error LNK1181: cannot open input file 'mistralcuda.lib'
Closed a month ago3
Function / Tools Calling in OpenAI API-compatible server
Closed a month ago4
Matrix/Element Chatroom
Closed a month ago4
API Docs 404
Closed a month ago1
Build Error: Redefinition of 'constexpr const Tp std::integralconstant<_Tp, __v>::value' when compiling with CUDA and Flash Attention
Closed a month ago1
Add DRY repetition penalty
Updated a month ago1
Fully integrate with Open WebUI
Updated a month ago
[FEATURE REQUEST] SYCL Backend
Updated a month ago
Serving available models via http://0.0.0.0:1234/models
Closed a month ago2
Running Llama 3.1 gives a 403 for tokenizer.json
Closed a month ago2
Mixtral crashes with: `Error: invalid type: null, expected usize at line 23 column 24`
Closed a month ago
Installation from PyPi doesn't work
Updated a month ago6
Paged Attention NVCC compile issue
Closed a month ago5
Encountering a "dtype mismatch in mul" error when running gemma2-9b-it on CPU.
Closed a month ago2
/usr/bin/ld: cannot find -lstdc++: No such file or directory
Updated a month ago5
Bug: Container image fails to start due to CUDA version mismatch
Updated a month ago1
Update mistralrs on PyPi to 0.1.26
Closed a month ago2
Python metal package runs on CPU
Updated a month ago2
[FEATURE REQUEST] Dynamic Model Loading and Unloading for Efficient VRAM Management
Updated a month ago3
Feature request: Exllamav2 (exl2) backend
Updated a month ago2
Seems to be slower than original model.
Closed a month ago5
Compilation failure (mistralrs-paged-attn v0.1.26)
Closed a month ago10
Coming optimizations
Closed a month ago
Extending AnyMoE to support heterogeneous expert types
Updated 2 months ago
Generate graph of AnyMoE loss
Closed 2 months ago
Read AnyMoE dataset from parquet instead of CSV
Closed 2 months ago
Support: Dolphin Vision 72B
Updated 2 months ago