promptfoo: a prompt engineering tool
promptfoo
helps you tune LLM prompts systematically across many relevant test cases.
With promptfoo, you can:
- Test multiple prompts against predefined test cases
- Evaluate quality and catch regressions by comparing LLM outputs side-by-side
- Speed up evaluations by running tests concurrently
- Flag bad outputs automatically by setting "expectations"
- Use as a command line tool, or integrate into your workflow as a library
- Use OpenAI models, open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API
» View full documentation «
promptfoo produces matrix views that allow you to quickly review prompt outputs across many inputs. The goal: tune prompts systematically across all relevant test cases, instead of testing prompts by trial and error.
Here's an example of a side-by-side comparison of multiple prompts and inputs:
It works on the command line too:
Usage (command line & web viewer)
To get started, run the following command:
npx promptfoo init
This will create some templates in your current directory: prompts.txt
, vars.csv
, and promptfooconfig.js
.
After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:
npx promptfoo eval
If you're looking to customize your usage, you have the full set of parameters at your disposal:
npx promptfoo eval -p <prompt_paths...> -o <output_path> -r <providers> [-v <vars_path>] [-j <max_concurrency] [-c <config_path>] [--grader <grading_provider>]
<prompt_paths...>
: Paths to prompt file(s)<output_path>
: Path to output CSV, JSON, YAML, or HTML file. Defaults to terminal output<providers>
: One or more of:openai:<model_name>
, or filesystem path to custom API caller module<vars_path>
(optional): Path to CSV, JSON, or YAML file with prompt variables<max_concurrency>
(optional): Number of simultaneous API requests. Defaults to 4<config_path>
(optional): Path to configuration file<grading_provider>
: A provider that handles the grading process, if you are using LLM grading
After running an eval, you may optionally use the view
command to open the web viewer:
npx promptfoo view
Examples
Prompt quality
In this example, we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
npx promptfoo eval -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo
This command will evaluate the prompts in prompts.txt
, substituing the variable values from vars.csv
, and output results in your terminal.
Have a look at the setup and full output here.
You can also output a nice spreadsheet, JSON, YAML, or an HTML file:
Model quality
In this example, we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo openai:gpt-4 -o output.html
Produces this HTML table:
Full setup and output here.
Usage (node package)
You can also use promptfoo
as a library in your project by importing the evaluate
function. The function takes the following parameters:
-
providers
: a list of provider strings orApiProvider
objects, or just a single string orApiProvider
. -
options
: the prompts and variables you want to test:{ prompts: string[]; vars?: Record<string, string>; }
Example
promptfoo
exports an evaluate
function that you can use to run prompt evaluations.
import promptfoo from 'promptfoo';
const options = {
prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
vars: [{ body: 'Hello world' }, { body: "I'm hungry" }],
};
(async () => {
const summary = await promptfoo.evaluate('openai:gpt-3.5-turbo', options);
console.log(summary);
})();
This code imports the promptfoo
library, defines the evaluation options, and then calls the evaluate
function with these options. The results are logged to the console:
{
"results": [
{
"prompt": {
"raw": "Rephrase this in French: Hello world",
"display": "Rephrase this in French: {{body}}"
},
"vars": {
"body": "Hello world"
},
"response": {
"output": "Bonjour le monde",
"tokenUsage": {
"total": 19,
"prompt": 16,
"completion": 3
}
}
},
// ...
],
"stats": {
"successes": 4,
"failures": 0,
"tokenUsage": {
"total": 120,
"prompt": 72,
"completion": 48
}
},
"table": [
// ...
]
}
Configuration
- Setting up an eval: Learn more about how to set up prompt files, vars file, output, etc.
- Configuring test cases: Learn more about how to configure expected outputs and test assertions.
Installation
API Providers
We support OpenAI's API as well as a number of open-source models. It's also to set up your own custom API provider. See Provider documentation for more details.
Development
Contributions are welcome! Please feel free to submit a pull request or open an issue.
promptfoo
includes several npm scripts to make development easier and more efficient. To use these scripts, run npm run <script_name>
in the project directory.
Here are some of the available scripts:
build
: Transpile TypeScript files to JavaScriptbuild:watch
: Continuously watch and transpile TypeScript files on changestest
: Run test suitetest:watch
: Continuously run test suite on changes