Better utilise token limits per each model

Question

Better utilise token limits per each model

hatkyinc2 opened this issue a year ago · comments

Background story explaining the current situation:
gpt3.5 has 4k token limit. gpt4 has 8k token limit but also has other versions with a higher token limit. while we support more models in the future.
The way we currently deal with token allocations is hardcoded env variables.

Desired new situation:
We should have a variable with the max tokens per each model, possibly re-using the variable used for costs.
Each agent should decide on some percent allocation of token per input/output, these could be hardcoded env variables.
That allocation should pass to the agent call as to how many tokens to return.
The agent would need to asses the current usage of tokens in the input.

Implementation details:

codeautopilot · Answer 1 · Mon May 01 2023 06:35:21 GMT+0800 (China Standard Time)

🤖 This is my suggestion:

File: agents/genericAgent.js
To update the agent call to pass the token allocation and assess the current usage of tokens in the input, follow these steps:

Import the necessary modules and functions:

const { countTokens } = require('../modules/tokenHelper');
const { getModelMaxTokens } = require('../modules/model');

Update the callAgent function to calculate the token allocation and pass it to the callGPT function:

async function callAgent(template, values, model) {
    const promptTemplate = PromptTemplate.fromTemplate(template);
    const prompt = await promptTemplate.format(values);

    // Calculate the token allocation
    const inputTokens = countTokens(prompt);
    const maxTokens = getModelMaxTokens(model);
    const outputTokens = Math.floor(maxTokens * 0.5); // Adjust the percentage as needed
    const totalTokens = inputTokens + outputTokens;

    // Check if the total tokens are within the model's limit
    if (totalTokens > maxTokens) {
        throw new Error(`The total tokens (${totalTokens}) exceed the model's limit (${maxTokens})`);
    }

    const reply = await callGPT(prompt, model, outputTokens);

    // console.log(`Prompt: ${prompt}`);
    // console.log(`Reply:\n`);
    // console.dir(reply, { depth: null });

    return reply;
}

Update the callGPT function in modules/gpt.js to accept the outputTokens parameter and pass it to the API call.

With these changes, the agent will now calculate the token allocation based on the model's limit and pass it to the GPT call.

File: modules/tokenHelper.js
To update the token counting logic to consider the max tokens per model, you can add a new function getMaxTokensForModel that returns the max tokens for the given model. You can also update the countTokens function to accept a maxTokens parameter and limit the token count accordingly.

Here's the updated code for modules/tokenHelper.js:

const { get_encoding } = require('@dqbd/tiktoken');

/**
 * Returns the max tokens for the given model.
 * @param {string} modelName - The name of the model.
 * @returns {number} - The max tokens for the model.
 */
function getMaxTokensForModel(modelName) {
  const modelTokenLimits = {
    'gpt3.5': 4000,
    'gpt4': 8000,
    // Add other models and their token limits here
  };

  return modelTokenLimits[modelName] || 0;
}

/**
 * Counts the number of tokens in the input string, limited by maxTokens.
 * @param {string} input - The input string to tokenize.
 * @param {number} maxTokens - The maximum number of tokens allowed.
 * @returns {number} - The number of tokens in the input string, limited by maxTokens.
 */
function countTokens(input, maxTokens) {
  const encoder = get_encoding("cl100k_base");
  const tokens = encoder.encode(input);
  const tokenCount = Math.min(tokens.length, maxTokens);
  encoder.free();
  return tokenCount;
}

module.exports = { countTokens, getMaxTokensForModel };

Now, you can use the getMaxTokensForModel function to get the max tokens for a specific model and pass it to the countTokens function to limit the token count accordingly.

File: modules/model.js
To implement the desired new situation, you can follow these steps:

Add a variable with the max tokens per each model.
Allocate a percentage of tokens for input/output.

Here's the updated code for modules/model.js:

const { OpenAI } = require('langchain/llms');

// Add a variable with the max tokens per each model
const modelTokenLimits = {
  'gpt-3.5-turbo': 4096,
  'gpt-4': 8192,
};

// Allocate a percentage of tokens for input/output
const tokenAllocation = {
  input: 0.8,
  output: 0.2,
};

/**
 * 
 * Returns an instance of the specified language model.
 * @param {string} modelType - The type of language model to return. 
   * Currently Supported ['gpt-3.5-turbo', 'gpt-4'].
 * @returns {Object} - An instance of the specified language model.
 * @throws {Error} if the input model type is not supported
 */
function getModel(modelType) {
  let model;
  if (['gpt-3.5-turbo', 'gpt-4'].includes(modelType)) {
    const maxTokens = modelTokenLimits[modelType];
    const inputTokens = Math.floor(maxTokens * tokenAllocation.input);
    const outputTokens = Math.floor(maxTokens * tokenAllocation.output);

    model = new OpenAI({
      modelName: modelType,
      maxTokens: outputTokens,
      temperature: parseFloat(process.env.MODEL_TEMPERATURE),
      presencePenalty: parseFloat(process.env.MODEL_PRESENCE_PENALTY),
      frequencyPenalty: parseFloat(process.env.MODEL_FREQUENCY_PENALTY),
      user: process.env.MODEL_USER,
      openAIApiKey: process.env.OPENAI_API_KEY,
    });
  } else {
    throw new Error(`Model type: ${modelType} not supported.`);
  }
  return model;
}

module.exports = { getModel };

In this update, we've added a modelTokenLimits object that contains the max tokens for each model. We've also added a tokenAllocation object that specifies the percentage of tokens to be allocated for input and output. Then, we calculate the number of tokens for input and output based on the model's token limit and pass the output tokens to the OpenAI constructor.