daman1209arora / jeebench

JEEBench, EMNLP 2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JEEBench(EMNLP 2023)

Repository for the code and dataset for the paper: "Have LLMs Advanced Enough? A Harder Problem Solving Benchmark For Large Language Models" accepted in EMNLP 2023 as a Main conference paper. https://arxiv.org/abs/2305.15074v2

respresentative

Dataset

To access the dataset, unzip the dataset.zip file. This contains the dataset, few-shot examples and responses collected from GPT models along with extracted answers. The dataset contains questions from Physics, Chemistry and Mathematics collected from JEE Advanced 2016 to 2023. The breakdown with respect to subject type and response type is as follows:

drawing

Quick example

To run a baseline such as GPT-3.5 with zero-shot Chain of Thought on the first 10 questions of the dataset using 2 parallel requests, run: python inference.py --model gpt-3.5-turbo --mode CoT --max_questions 1 --num_procs 2

To evaluate your results, use the code provided in compute_metrics.py: python compute_metrics.py

Baselines

image

Citation

If you use our dataset in your research, please cite it using the following

@misc{arora2023llms,
      title={Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models}, 
      author={Daman Arora and Himanshu Gaurav Singh and Mausam},
      year={2023},
      eprint={2305.15074},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

JEEBench, EMNLP 2023


Languages

Language:Python 100.0%