Document different package options
transitive-bullshit opened this issue · comments
It seems there are multiple NPM packages associated with this tiktoken
port, and I wasn't able to find the differences clearly documented anywhere. (@dqbd/tiktoken
, js-tiktoken
, and tiktoken
).
Langchainjs seems to be using js-tiktoken
(reference and associated commit langchain-ai/langchainjs@d60eae5), so I'm going with that for now, but the readme on this project uses tiktoken
instead of js-tiktoken
, and @dqbd/tiktoken
looks like it's still around.
@dqbd would love any clarity you can provide here, and thank you again for your amazing work on this project 🙏
Also, what does the js-tiktoken/lite
version actually do differently than the other packages?
Hello!
I got a little swamped with (school) work recently, so my apologies for the lack of documentation and clarity. I will update the README.md soon, but here are the gist of the changes and the rationale:
This repository maintains two packages.
tiktoken
(formally hosted at@dqbd/tiktoken
): WASM bindings for the original Python library, providing full 1-to-1 feature parity.js-tiktoken
: Pure JavaScript port of the original library with the core functionality, suitable for environments where WASM is not well supported or not desired (such as edge runtimes).
The reason to port the tiktoken
to JS is mainly due to the constraints of edge environments (large WASM bundle, the necessary setup to get WASM working etc.) and toolchain-runtime combinations (#37). The issues are compounded when users are not using the package directly but rather as an dependency of an another library such as LangchainJS (langchain-ai/langchainjs#1239).
The plan going forward is to converge the APIs of both libraries to be interchangeable, allowing isomorphic behaviour (#43) and add appropriate documentation soon (with an additional PR for benchmarking both packages). Will close the issue after that is done :)
Hope that clears up!
First off, you rock @dqbd 🔥
This makes a ton of sense, and no worries about being swamped w/ school / work. Totally understand and it's all part of open source :)
Thanks for the thorough explanation – will update https://github.com/transitive-bullshit/compare-tokenizers and my other projects accordingly 🙏