symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Home Page:https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple model parameter with same value result in multiple evaluations

Munsio opened this issue · comments

When running the evaluation and specifying the same model multiple times currently the evaluation is run X amount of times for the model.

Example:

eval-dev-quality evaluate --runtime docker --result-path ./docker-test --runs 5 --model symflower/symbolic-execution --model symflower/symbolic-execution --model symflower/symbolic-execution --repository golang/plain

This runs the symflower/symbolic-execution 3 times with 5 runs as the model was 3 times specified as parameter.

Question:
Do we want this behavior or should we unique the list of models after parsing?