Giters
EricLBuehler
/
mistral.rs
Blazingly fast LLM inference.
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
1878
Watchers:
21
Issues:
124
Forks:
154
EricLBuehler/mistral.rs Issues
Support loading tokenizer from `sentencepiece` model
Updated
2 days ago
Comments count
5
dolphin-2.9-mixtral-8x22b.Q8_0.gguf "Error: cannot find tensor info for blk.0.ffn_gate.0.weight"?
Updated
2 days ago
Comments count
8
Quantized Phi3: Features to add
Updated
3 days ago
Comments count
3
Feature Req: Add Importance Matrix / RAM avail calculations to ISQ
Updated
3 days ago
Comments count
3
Support for T5 Architecture
Updated
4 days ago
Comments count
3
Logit Bias Error
Updated
4 days ago
Installation Error
Updated
4 days ago
Comments count
1
Running model from a GGUF file, only
Updated
4 days ago
Comments count
44
Crashing when trying to run with error "A command encoder is already encoding to this command buffer" on Metal
Closed
4 days ago
Comments count
3
Cross GPU device mapping feature
Updated
5 days ago
Prompt sequence length is greater than 4096
Closed
5 days ago
Comments count
11
Load chat template from GGUF file
Updated
5 days ago
Support loading multiple GGUF files
Closed
5 days ago
Garbled output on very long prompts
Updated
5 days ago
Comments count
4
Enabling prefix cache for llama3 gguf
Updated
6 days ago
Comments count
11
Fails on a read-only volume
Closed
7 days ago
bug: If device layers requested exceed model layers, host layers overflow
Updated
7 days ago
Comments count
13
Speed in --interactive mode
Updated
7 days ago
Comments count
1
Benching local GGUF model layers allocated to vRAM but no GPU activity
Closed
10 days ago
Comments count
3
Mistral rs python binding error
Updated
10 days ago
Comments count
6
Insitu quantization OOM for large models
Updated
15 days ago
Comments count
1
Python mistralrs-cuda not running on GPU
Closed
17 days ago
Comments count
3
Is it possible to add support for Infini-attention?
Updated
18 days ago
Comments count
2
Memory Optimization for low memory machines
Closed
a month ago
Comments count
7
Add C api and provide shared and static libraries.
Updated
20 days ago
Comments count
1
Use PromptTemplate for custom HuggingFace model
Closed
21 days ago
Comments count
3
mistral does not support NVIDIA V100 (compute_cap <= 800)
Updated
22 days ago
Comments count
1
Add support for llm-chain
Closed
24 days ago
Comments count
2
New `Unexpected rank, expected 3, got: 2`
Closed
24 days ago
Comments count
3
Cancel AI Answer without program termination in --interactive mode
Closed
25 days ago
Comments count
7
recompile with -fPIE : failed cargo build with cuda feature on Red Hat Linux distribution
Closed
25 days ago
Comments count
8
pip install of mistralrs not working
Closed
a month ago
Comments count
32
How can you run inference with a local GGUF file?
Closed
a month ago
Comments count
6
LoRA swapping at runtime
Updated
a month ago
Comments count
10
Speed up speculative decoding implementation
Closed
a month ago
Comments count
1
mistralrs-server: prompt step - Model failed with error: DTypeMismatchBinaryOp { lhs: F16, rhs: F32, op: "where" }
Closed
a month ago
Comments count
2
New multiplexing scheduler
Closed
a month ago
Phi3 models broken with causal mask
Closed
a month ago
Mistralrs-bench: do warmup run
Closed
a month ago
Server crashes while processing 2 concurrent requests
Closed
a month ago
Comments count
2
interactive mode should accept EOF
Closed
a month ago
Comments count
3
llama.cpp does not segfault on Pi 5 running Mistral 7B Instruct v0.1 Q2_K
Closed
a month ago
Comments count
1
distribute with cargo dist?
Updated
a month ago
Comments count
2
Fix timings of completion, add timing of sampling back
Closed
a month ago
Comments count
3
TensorF16 not found
Closed
a month ago
Comments count
3
Problem running with Mac M1
Closed
a month ago
Comments count
2
Axum server blocking - Add async channels
Closed
a month ago
Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"
Closed
a month ago
Comments count
5
Sliding window models do not properly slice KV cache
Closed
a month ago
Comments count
2
Accelerate topk, topp sampling with `argsort`
Closed
a month ago
Previous
Next