Make ITextEmbeddingGenerator.CountTokens and ITextGenerator.CountTokens ValueTask<int>

Question

Make ITextEmbeddingGenerator.CountTokens and ITextGenerator.CountTokens ValueTask<int>

JohnGalt1717 opened this issue 8 months ago · comments

Right now these are synchronous, but if you're using an online service to implement these (i.e. LLama.cpp server) then these need to be able to return async responses. Having it return ValueTask would be greatly helpful.

Inversely GenerateEmbeddingAsync could be ValueTask as well.

Devis Lucato · Answer 1 · Fri Dec 15 2023 03:07:40 GMT+0800 (China Standard Time)

We tried making it async during the initial implementation, but it would affect the speed and complexity of the text chunker that would need quite a bit of rewrite, and raise questions about the usage. Counting tokens is currently meant to be fast&free, e.g. we use CountTokens even for logging statements. If we modify that assumption, we'll need to reassess each use case to avoid unnecessary calls and unforeseen expenses.

James Hancock · Answer 2 · Fri Dec 15 2023 04:12:35 GMT+0800 (China Standard Time)

OK, well what I'm doing is creating an implementation of the LLama API (native not emulated). I can't find a good LLama token counter for C# that will do it without the API call.

Suggestions?

Devis Lucato · Answer 3 · Fri Dec 15 2023 04:31:28 GMT+0800 (China Standard Time)

IIRC Llama uses SentencePiece, anything available in that direction?

James Hancock · Answer 4 · Fri Dec 15 2023 04:42:36 GMT+0800 (China Standard Time)

Seems like you guys have one?

https://github.com/microsoft/BlingFire

John Hershberg · Answer 5 · Sat Jan 13 2024 21:51:13 GMT+0800 (China Standard Time)

Seems like you guys have one?

https://github.com/microsoft/BlingFire

Did you end up going with this? I'm facing this exact issue right now and haven't found a good solution. I'm still doing it async, but just using .GetAwaiter().GetResult(); which is terrible. BlingFire seems ok, but also requires you to load and unload the model each time, which seems bad to me.

I don't exactly see why this method can't be async