Evaluation task: Code repair

Question

Evaluation task: Code repair

ruiAzevedo19 opened this issue 3 months ago · comments

Goal

Given source code with compilation errors, the model needs to repair the code such that the source code compiles. The response is validated by executing predefined tests making sure that the implementation itself is not altered.

PRs

Follow-up

#200

TODOs

Testdata
- Examples
  - function opening brackets are missing
  - type is missing
  - type is wrong
  - import is missing
  - variable is not declared
- For each case:
  - generate test with symflower unit-tests
  - check the tests are passing
  - add a mistake to the implementation
  - commit
Implementation
- Define a new task identifier: code-repair
- For symflower model define this task as unsupported because we always generate deterministic tests
- For LLM models
  - Define the new task as supported
  - Create an interface for tasks
    - Interface: Task
    - Methods
      - Run(repository) (assessment, err): run the task for the given repository and return the assessments
      - Identifier: returns the task identifier
  - Define tasks
    - TaskWriteTests
      - The Run method is basically what we already have in evaluate/repository.go:Evaluate
      - Remove evaluate/repository.go:Evaluate since is now part of the task
    - TaskCodeRepair
      - The Run method is responsible to only run the task for source code files (filter out test files and other files)
        
        The method must range over the sub-directories in mistakes testdata and and run the code repair task for each sub-directory
        
        Add two methods to the language interface
        
        DefaultFileExtension returns the language file extension
        
        DefaultTestFileSuffix returns the language test file suffix, i.e., _test.go for Go and Test.java for Java
        
        Note: this will be used to easily filter out files
  - Calling the Run method
    - replace the call temporaryRepository.Evaluate(...) in evaluate/evaluate.go:Evaluate with the task Run method
      - We are ranging over temporaryRepository.Tasks so we need a function TaskForIdentifier(taskIdentifer) that, given a task identifier, return the task struct
Review and merge #197
Accommodate the code repair logic to changes made in #197