NTDXYG / open-eval

Another dataset for evaluating competition-level code generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenEval

Another dataset for evaluating competition-level code generation.

Introduction

We present the OpenEval dataset, a collection of 178 function-level programming problems sourced from open-source programming competition websites. This dataset serves as a resource for evaluating the functional correctness of LLMs in code generation tasks.

OpenEval can be considered as a complementary version of the HumanEval dataset.

Construct

Each problem in OpenEval includes a function signature, a documentation string, and a code body, with five test cases per problem.

To ensure consistency between the documentation strings and the code body, we employ open-ai's GPT3.5 interface in a few-shot method to generate the corresponding documentation strings. These generated strings are then manually reviewed to ensure their quality.

Furthermore, we take precautions to prevent test case leakage from affecting model performance by excluding test cases from the model input prompt.

Use

Used exactly the same way as HumanEval.

Evaluation Results

Model Name Model Size Pass@1
CodeGen 350M 8.989
2B 14.607
6B 19.663
StarCoder 1B 8.989
3B 12.360
7B 21.348
CodeT5+ 220M 7.303
770M 10.674
2B 16.292
6B 21.910

Example

{
	"task_id": "Open/0", 
	"prompt": "def validPosition ( arr , N , K ) :
    \"\"\"Write a function that takes in an array, the length of the array, and a number K.
    The function calculates the sum of all the elements in the array.
    Then, it checks each element in the array and counts how many elements, when increased by K, would be greater than the sum of all the other elements in the array.
    Finally, the function returns the count.
    \"\"\"
    ", 
    "entry_point": "validPosition", 
    "canonical_solution": "    count = 0 ; sum = 0 ;\n    for i in range ( N ) :\n        sum += arr [ i ] ;\n    for i in range ( N ) :\n        if ( ( arr [ i ] + K ) > ( sum - arr [ i ] ) ) :\n            count += 1 ;\n    return count ;\n", 
    "test": "
    METADATA = {
        'author': 'yg',
        'dataset': 'test'
    }
    
    def check(candidate):
    # test case 1
    arr = [1, 2, 3, 4, 5]
    N = 5
    K = 2
    assert candidate(arr, N, K) == 0
    
    # test case 2
    arr = [1, 2, 3, 4, 5]
    N = 5
    K = 10
    assert candidate(arr, N, K) == 3
    
    # test case 3
    arr = [1, 1, 1, 1]
    N = 4
    K = -2
    assert candidate(arr, N, K) == 0
    
    # test case 4
    arr = [1, 1, 1, 1]
    N = 4
    K = 4
    assert candidate(arr, N, K) == 4
    
    # test case 5
    arr = []
    N = 0
    K = 4
    assert candidate(arr, N, K) == 0"
}

Citation

Please cite using the following bibtex entry:

@misc{openeval,
  author = {Guang Yang},
  title = {OpenEval: Another dataset for evaluating competition-level code generation},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/NTDXYG/open-eval}},
}

About

Another dataset for evaluating competition-level code generation