promptfoo / promptfoo

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.

https://www.promptfoo.dev/

promptfoo/promptfoo Issues

Request: `json-equals` assertion
Updated 5 days ago4
SQLITE_CONSTRAINT_NOT_NULL Error when running "promptfoo view"
Closed 5 days ago1
Use test vars in function schema
Closed 6 days ago3
Use Gcloud default authentication instead of an API key
Closed 6 days ago6
When writing JSON output of an eval, include the timestamp of the run
Closed 6 days ago2
Python scripts in prompts does not work with anthropic as provider
Closed 6 days ago3
Feature Request: Also Pass `logProbs` to Python Assertions
Closed 6 days ago2
In the html report (view/shared), instruction tokens and completion tokens should be shown separately.
Closed 6 days ago3
No Amazon Bedrock models can be used as embedding providers for similarity assertion
Closed 6 days ago2
How to test prompt with triple dashes inside without breaking into separate prompts?
Closed 6 days ago1
Standardize how LLM output is stored in a ProviderResponse output object
Updated 7 days ago2
How to get, Prompt, Output, and Assertion value into Python
Updated 7 days ago1
Prettify messages json strings for multi-prompt chat conversations
Updated 8 days ago
Documentation for data structures GradingResult and ProviderResponse
Updated 13 days ago1
Browser exception (Minified React error #31; visit) when clicking magnifying glass in output
Updated 13 days ago1
Description Field in Test.yaml Causes Assertions to Pass Incorrectly
Closed 14 days ago3
Request: support `--max-concurrency=auto`
Closed 14 days ago2
rouge-n-assertions may cause an js error for a test cases
Closed 14 days ago2
Feature Request: Assertion Sets
Closed 15 days ago3
Request: suppress warning `Treating it as a text prompt.` from each `eval`
Updated 16 days ago3
Feature Request: Have optional vars in assertions with ability to define fallbacks
Updated 17 days ago
Multiline vars replacement break indentation in config generation
Updated 18 days ago
Shouldn't "gemini-1.5-pro-latest" be named "gemini-1.5-pro-preview-0409"?
Closed 20 days ago1
not-equals is not working
Closed 20 days ago
CORS error when using self hosted server
Closed 21 days ago4
0.57.0 update appears to break python script providers
Closed 22 days ago3
add evalId in generate OutputFile
Updated 22 days ago1
Request: ability to exclude `Description` from `promptfoo view` and persist UI settings across refresh
Updated 22 days ago3
pass logger as callApi
Closed 22 days ago2
Request: CLI arg to suppress exit code 100 in `promptfoo eval`
Closed 22 days ago1
Error running database migrations: Cannot open database because the directory does not exist
Closed 22 days ago3
Random sampling of n data rows
Updated 22 days ago1
How to best express a QA (question-answer) dataset?
Closed 22 days ago3
Multiple prompts in a single prompt file when using multiple prompt files
Updated 23 days ago
Docs request: where does histogram come from?
Updated 23 days ago1
CLI --first-n only runs the first test case regardless of specified argument when using test CSV
Closed 24 days ago1
Bug (?): escaping newline in `prompts`?
Closed 24 days ago2
add header with provider information
Closed 25 days ago8
Caching concurrent calls in Python provider does not use new caching behaviour
Updated 25 days ago
Request: JSON schema for `promptfooconfig.yaml`
Updated 25 days ago
Request: better traceback on Python `SyntaxError`
Closed 25 days ago4
Feature Request: Extend Comment options in UI
Closed 25 days ago3
OpenAI base URL environment variable: OPENAI_API_BASE_URL vs OPENAI_BASE_URL
Closed 25 days ago1
allow provider to report updated prompt?
Updated 25 days ago1
support for different provider for llm-rubric and embeddings
Closed a month ago2
Feature Request: Version Release LLM-Judged Metrics To Enable Pinning
Updated a month ago
Trying to self host on ECS but receiving Internal Server Error
Updated a month ago2
Potential bug with `answer-relevance` assertion calculation
Closed a month ago1
mistakenly added
Closed a month ago
Cannot See Previous Evaluation History Intermittently
Updated a month ago