openai / simple-evals

openai/simple-evals Issues

Question: Why is Claude running with temperature 0 and GPT4 with temperature 0.5? wouldn't that be a handicap for Claude in Humaneval?
Updated a month ago
Has anyone run this code and cached the granular data?
Updated 2 months ago
Run benchmarks for old GPT-4 models (GPT-4-0314 and GPT-4-0613) and all GPT-3.5-turbo models
Updated 2 months ago
Add itemized scores?
Updated 2 months ago
Run benchmarks also for GPT-3.5 versions and Claude Sonnet and Haiku
Updated 3 months ago1
types is overriding the stdlib module "types"
Closed 3 months ago
Demo does not run - azure credentials
Closed 3 months ago1