hao-ai-lab / LookaheadDecoding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lookahead Decoding Development Roadmap

Viol2000 opened this issue · comments

Software Quality

  • Refactor Code #9
  • Simple way to add new model

Implementation

  • Support FlashAttention
  • Support Sampling
  • Support Batch>1
  • Lookahead window KV-Cache (May hurt accuracy)
  • Verification branch trie

New Models

Does this project support the vicuna model?

Does this project support the vicuna model?

Vicuna is already supported because it is based on LlamaForCausalLM.

Thank you for your reply!Do you mean that I can use the following code to observe the acceleration effect of the vicuna model?
USE_LADE=1 python applications/chatbot.py --model_path meta-llama/vicuna-7b-v.13 --debug #no chat, with lookahead
USE_LADE=0 python applications/chatbot.py --model_path meta-llama/vicuna-7b-v.13 --debug #no chat, without lookahead

Thank you for your reply!Do you mean that I can use the following code to observe the acceleration effect of the vicuna model? USE_LADE=1 python applications/chatbot.py --model_path meta-llama/vicuna-7b-v.13 --debug #no chat, with lookahead USE_LADE=0 python applications/chatbot.py --model_path meta-llama/vicuna-7b-v.13 --debug #no chat, without lookahead

It should be lmsys/vicuna-7b-v1.3 and yes.

Got it!Thank you for your reply again!

Hi, Im really interesting in this decoding development. Is there any progress to integrate in Qwen model? thanks.