symflower / eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Home Page:https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic selection of repositories is broken

zimmski opened this issue · comments

make install && eval-dev-quality evaluate --model symflower/symbolic-execution should run all tasks (currently we have only one task) but it should run on all languages and all repositories. This is not the case anymore.

Best to refactor the code, so this can be tested independently of the actual evaluation execution. because testing all repositories in the test suite would be crazy.