promptfoo / promptfoo

Test your prompts, agents, and RAGs. Redteaming, pentesting, vulnerability scanning for LLMs. Improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Home Page:https://www.promptfoo.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Loading data for evaluate() from data loaded into memory

anthonyivn2 opened this issue · comments

Hi team,

I was reading through the docs and it seems like today promptfoo's evaluate() does not support the use of data that is already in memory (when running the node library). I have a use case where I would like to evaluate hundreds of new question answers on a daily basis through promptfoo, and I am not sure if that is something that is doable today.

Hi @anthonyivn2, can you clarify what you're looking for? It seems you could do something like this and pass in the data you've already loaded separately:

const results = await promptfoo.evaluate(
  {
    prompts: ["First prompt with {{question}} and {{answer}}', "..."],
    providers: ["openai:gpt-3.5-turbo"],
    tests: [
      {
        vars: {
          question: "...",
          answer: "...",
        },
      },
      {
        vars: {
          question: "...",
          answer: "...",
        },
      },
    ],
    writeLatestResults: true, // write results to disk so they can be viewed in web viewer
  },
  {
    maxConcurrency: 2,
  },
);

@typpo Ok so does evaluate() accepts different input formats other than the TestSuiteConfiguration object as specified in the Node Package Usage doc? That example you showed me above would definitely work for my use case if I can do that with evaluate().

Also I would like to confirm with you on another thing. Would writeLatestResults: true makes it so that I share the data (I don't want to share the data), or does it only write the result to my local disk so that its viewable in the web viewer?

Ok this should work for my use case. I will close this issue, thank you for the response!