octoml / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Home Page:https://mlc.ai/mlc-llm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Tracking] Sampler optimization

masahi opened this issue · comments

Let's collect remaining issues we are aware of related to sampler performance

  • Small regression (1 req / sec drop from benchmark_throughput.py) after #192 when only greedy sampling is used.
  • Logprobs, and JSON are extremely slow

The first issue seems to have been fixed by @vvchernov #215

Hello @masahi! No, my fix in #215 resolved very strong (more than one order) reduction after #214.
About task 1: 1. we observed reduction ~25-30% after #192 2. It was not resolved, I'm investigating the issue
About task 2: I remember about logprobs, but looks like resolving of task 1 requires sampler refactor and I want to do it first (or somebody will do it)