Inference Benchmark
Benchmark for machine learning model online serving.
Collected information
- throughput
- latency
- P99
- P90
- P50
- CPU usage
- memory usage
- GPU utilization
- GPU memory usage
Tasks
- NLP
- CV
- Speech Recognition
- Embedding
- Coding Assistant
WIP Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
Benchmark for machine learning model online serving.
WIP Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)