Quilljou / transmart

Hi, thanks for the great project.

I would like to know if you can implement this small change: the MAX_TOKENS is hard-coded to 2000 in this file:

const MAX_TOKENS = 2000

It would be nice to allow user to customize this variable inside transmart.config.js, in case there're models having much longer context window (remind: openai already released gpt-4 with 128K tokens)

Okay. Perhaps I will support various context window models, but it's not user-friendly to customize. What are your thoughts?

Personally, I think having a MAX_TOKENS that can be configured from transmart.config.js is user-friendly enough.

Additionally, we can break it into 2 variables:

modelContextLimit: the real max context window (for example, 16k, 32k, 128k,...) that the model support
modelContextSplit: the ratio to split between number of input / output tokens. For example, if the input language is English and output is Spanish, you may expect 1 input token to produce 2 output tokens. In this case, the variable is set to 1/2. By default, we can set to 1/1

Moreover, today I've got time to test transmart with some open source models (using ollama):

mistral is pretty good (for both speed and quality), but struggle to remember the correct translation key. I suspect that by reducing MAX_TOKENS, we can resolve that.
mixtral:8x7b is slow and produce invalid JSON. I don't have enough time to find out why, but it seems to me that the context window does play a big role when running these open source model.

Therefore, I see a real need of modifying the max token.

Sounds good. I'm working on it. have a try when finished

How can you test transmart with some open-source models? By proxying mistral to openai API way?

Yes, I use a proxy. Here is a docker compose for reference (note: you need to docker exec -it ollama bash and then ollama push mistral to download the model)

version: '3'
services:
  ollama:
    container_name: ollama
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - /root:/root

  llm-api:
    image: ghcr.io/berriai/litellm:main-v1.10.3
    command: ["/bin/sh", "-c", "pip install async_generator && litellm --model ollama/mistral --api_base http://ollama:11434 --host 0.0.0.0 --port 3000"]
    entrypoint: []
    platform: linux/amd64
    ports:
      - "3000:3000"

Custom MAX_TOKENS