yinqiwen / lmsf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rust LLM Serving Framework

Features

  • Paged Attention
  • Continuous Batch
  • Quantization
    • awq
    • squeezellm
  • Models
    • llama
    • gemma
    • chatglm

Getting Started

Examples

$ cargo run --release --example llm_engine_example -- --model <llma model dir> --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024

API Server

$ cargo build --release
$ ./target/release/entrypoints --model <llma model dir> --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024 --host 0.0.0.0 --port 8000

About


Languages

Language:Cuda 39.3%Language:Rust 25.7%Language:C 19.2%Language:C++ 13.9%Language:CMake 1.7%Language:Python 0.1%