Custom MAX_TOKENS
ngxson opened this issue · comments
Hi, thanks for the great project.
I would like to know if you can implement this small change: the MAX_TOKENS is hard-coded to 2000 in this file:
transmart/packages/core/src/split.ts
Line 4 in fd8da7c
It would be nice to allow user to customize this variable inside transmart.config.js
, in case there're models having much longer context window (remind: openai already released gpt-4 with 128K tokens)
Okay. Perhaps I will support various context window models, but it's not user-friendly to customize. What are your thoughts?
Personally, I think having a MAX_TOKENS
that can be configured from transmart.config.js
is user-friendly enough.
Additionally, we can break it into 2 variables:
modelContextLimit
: the real max context window (for example, 16k, 32k, 128k,...) that the model supportmodelContextSplit
: the ratio to split between number of input / output tokens. For example, if the input language is English and output is Spanish, you may expect 1 input token to produce 2 output tokens. In this case, the variable is set to1/2
. By default, we can set to1/1
Moreover, today I've got time to test transmart with some open source models (using ollama):
mistral
is pretty good (for both speed and quality), but struggle to remember the correct translation key. I suspect that by reducingMAX_TOKENS
, we can resolve that.mixtral:8x7b
is slow and produce invalid JSON. I don't have enough time to find out why, but it seems to me that the context window does play a big role when running these open source model.
Therefore, I see a real need of modifying the max token.
Sounds good. I'm working on it. have a try when finished
How can you test transmart with some open-source models? By proxying mistral to openai API way?
Yes, I use a proxy. Here is a docker compose for reference (note: you need to docker exec -it ollama bash
and then ollama push mistral
to download the model)
version: '3'
services:
ollama:
container_name: ollama
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- /root:/root
llm-api:
image: ghcr.io/berriai/litellm:main-v1.10.3
command: ["/bin/sh", "-c", "pip install async_generator && litellm --model ollama/mistral --api_base http://ollama:11434 --host 0.0.0.0 --port 3000"]
entrypoint: []
platform: linux/amd64
ports:
- "3000:3000"