Development Roadmap (Deprecated)

Question

Development Roadmap (Deprecated)

Ying1123 opened this issue 7 months ago · comments

Ying Sheng commented 7 months ago

Function Calling

Frontend
- Add tools argument in sgl.gen. See also guidance tools
Backend
- OpenAI: Translate to their function calling API (https://platform.openai.com/docs/guides/function-calling).
  - #573
- Local Models (SGLang)
  1. Use SGLang primitives (regex, select) and constrained decoding to implement a workflow
  2. Directly use models that support function calling (e.g., Gorilla OpenFunctions, https://huggingface.co/jondurbin/bagel-dpo-7b-v0.4#prompting-strategies)
- Local Models (OpenAI-compatible API)

High-level Pythonic Interface

#39

Inference Optimizations

Speculative decoding for local models
Speculative execution for OpenAI Chat API
- #48

Structured Decoding

Support parallel JSON decoding varunshenoy/super-json-mode#8
Support auto parallel decoding https://arxiv.org/abs/2401.06761

Compiler

Support tracing and compiling sgl.fork
Support sending a full serialized SGL program to the server

LoRA Support

Port multi-LoRA batching and unified memory from S-LoRA

Model Coverage

Vision Langauge Models. Support top-performing models from https://github.com/open-compass/VLMEvalKit
Language Models. Port the implementation of popular models from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models. (help)

Device Coverage

AMD support. Investigate AMD support in Trion and FlashInfer.
CPU support. This is better done by adding a llama.cpp backend.

AriMKatz · Answer 1 · Thu Feb 08 2024 00:36:56 GMT+0800 (China Standard Time)

Are there still plans for a high level pythonic interface? #39 (comment)

Ying Sheng · Answer 2 · Thu Feb 08 2024 03:03:20 GMT+0800 (China Standard Time)

Are there still plans for a high level pythonic interface? #39 (comment)

Hi @AriMKatz, thanks for the reference. This is very important, I just added it.

Srinivas Billa · Answer 3 · Fri Feb 09 2024 07:03:05 GMT+0800 (China Standard Time)

For the vision models support, is it possible to align with the openai gpt4v API?
https://platform.openai.com/docs/guides/vision

Billy Cao · Answer 4 · Sat Feb 10 2024 14:27:24 GMT+0800 (China Standard Time)

Are there plans for loading models in 8bit or 4bit?

Ying Sheng · Answer 5 · Sat Feb 10 2024 19:58:59 GMT+0800 (China Standard Time)

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

@nivibilla Yes, it is already aligned with the openai gpt4v API, see here.
You can also find a runnable example of serving it with Sky Serve here.

Ying Sheng · Answer 6 · Sat Feb 10 2024 20:03:02 GMT+0800 (China Standard Time)

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

Billy Cao · Answer 7 · Sat Feb 10 2024 20:06:04 GMT+0800 (China Standard Time)

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

I'm looking to load llava 1.6 in 8bit, but it does not seem that llava series has AWQ or GPTQ quants, or did I miss out anything here?

EDIT: I saw 1.5 has but not 1.6 yet. Perhaps its just too new and no one did a calibration yet.

Enrique Shockwave · Answer 8 · Tue Apr 02 2024 07:02:59 GMT+0800 (China Standard Time)

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Ying Sheng · Answer 9 · Tue Apr 02 2024 13:48:36 GMT+0800 (China Standard Time)

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Bit0r · Answer 10 · Tue Apr 02 2024 19:34:14 GMT+0800 (China Standard Time)

Tools support is very important, which is necessary for many use cases.

omri-sap · Answer 11 · Thu Apr 04 2024 17:58:38 GMT+0800 (China Standard Time)

Is TinyLlama supported? TinyLlama/TinyLlama-1.1B-Chat-v1.0
generation seems a bit slow...

WILLE · Answer 12 · Mon May 06 2024 17:31:53 GMT+0800 (China Standard Time)

I see llama.cpp integration is on the roadmap. When will this feature be delivered? It would be very nice to have it , since it will support running local LLMs, such as llama models, on Mac computers and experiment them with the powerful and expressive SGLang.

Gintas Z. · Answer 13 · Thu May 09 2024 00:10:42 GMT+0800 (China Standard Time)

I'd request to include support for Phi-3-mini

Yudi Xue · Answer 14 · Tue Jun 25 2024 02:31:54 GMT+0800 (China Standard Time)

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.
Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Hi which branch is it? looks like better start fresh

Yineng Zhang · Answer 15 · Tue Jul 16 2024 16:12:59 GMT+0800 (China Standard Time)

I can help by getting rid of the vLLM in the dependencies.

Ying Sheng · Answer 16 · Wed Jul 17 2024 10:23:05 GMT+0800 (China Standard Time)

Moved to #634