Draft & Verify

Question

Draft & Verify

Ryu1845 opened this issue 9 months ago · comments

Does this repository implement Draft & Verify?

Phil Wang · Answer 1 · Sun Oct 08 2023 23:55:28 GMT+0800 (China Standard Time)

@Ryu1845 hey! thanks for sharing that paper!

that looks quite close if not better than the naive early exit strategy (they predict which layers to skip through some heuristic) - but using the same model for speculating / drafting is definitely what i was going for.

i think my prophet transformer idea should be the best though (although i'm biased and still haven't ran any head to head 😆)

Phil Wang · Answer 2 · Mon Oct 09 2023 00:02:05 GMT+0800 (China Standard Time)

@Ryu1845 really think we are going to see a resurgence in adaptive computation research over the next year, like actually made practical

Sofian Mejjoute · Answer 3 · Mon Oct 09 2023 00:16:35 GMT+0800 (China Standard Time)

I think so too, thanks again for your work.
it looks like the official code for the paper will be uploaded here, but I'll keep an eye on this repo too 😉

Phil Wang · Answer 4 · Mon Oct 09 2023 00:21:54 GMT+0800 (China Standard Time)

@Ryu1845 sounds good!

yea i think the main idea from the prophet idea is to take advantage of the cached last layer embedding from the large model, which should be superior to any early exit stuff. if you find me another paper that did that, would definitely read and implement

i'm also using a transformer on top, borrowing working ideas from hierarchical transformer line of research

Sofian Mejjoute · Answer 5 · Mon Oct 09 2023 00:23:06 GMT+0800 (China Standard Time)

yea i think the main idea from the prophet idea is to take advantage of the cached last layer embedding from the large model, which should be superior to any early exit stuff.

I don't know of any paper that does this but the medusa project aims to do just that I think.
https://together.ai/blog/medusa
https://github.com/FasterDecoding/Medusa

Phil Wang · Answer 6 · Mon Oct 09 2023 00:27:05 GMT+0800 (China Standard Time)

@Ryu1845 ohh yes, they totally did. so the only difference is i use a small transformer as the medusa / prophet heads

ok let me cite them as well

Phil Wang · Answer 7 · Mon Oct 09 2023 00:27:56 GMT+0800 (China Standard Time)

@Ryu1845 oh haha, they don't have a paper, just a github repo. may be the new trend

Sofian Mejjoute · Answer 8 · Mon Oct 09 2023 00:29:52 GMT+0800 (China Standard Time)

~~I'm guessing they'll release a paper once they've got a working prototype 😄~~
~~It looks like it's still a WIP FasterDecoding/Medusa#3~~
I actually don't know if it's running yet :/

Phil Wang · Answer 9 · Mon Oct 09 2023 00:30:49 GMT+0800 (China Standard Time)

@Ryu1845 ohh, so it isn't functional yet? maybe i'll send their group a message. solving batched spec decoding is a bit tricky with kv cache, but i found a solution (not sure if optimal)

Phil Wang · Answer 10 · Mon Oct 09 2023 00:31:50 GMT+0800 (China Standard Time)

~~I'm guessing they'll release a paper once they've got a working prototype 😄 ~~ ~~It looks like it's still a WIP FasterDecoding/Medusa#3~~ I actually don't know if it's running yet :/

so it works or doesn't work?

Sofian Mejjoute · Answer 11 · Mon Oct 09 2023 00:32:53 GMT+0800 (China Standard Time)

it looks like it works, I'm sorry for the misunderstanding on my side

Phil Wang · Answer 12 · Mon Oct 09 2023 00:33:20 GMT+0800 (China Standard Time)

nice! that's amazing, i believe in that approach

Jonathan Mamou · Answer 13 · Wed Nov 08 2023 19:37:03 GMT+0800 (China Standard Time)

@lucidrains
Amazing work!
Do you plan to release your results with early exit?
Thanks