JEEBench(EMNLP 2023)

Repository for the code and dataset for the paper: "Have LLMs Advanced Enough? A Harder Problem Solving Benchmark For Large Language Models" accepted in EMNLP 2023 as a Main conference paper. https://arxiv.org/abs/2305.15074v2

Dataset

To access the dataset, unzip the dataset.zip file. This contains the dataset, few-shot examples and responses collected from GPT models along with extracted answers. The dataset contains questions from Physics, Chemistry and Mathematics collected from JEE Advanced 2016 to 2023. The breakdown with respect to subject type and response type is as follows:

Quick example

To run a baseline such as GPT-3.5 with zero-shot Chain of Thought on the first 10 questions of the dataset using 2 parallel requests, run: python inference.py --model gpt-3.5-turbo --mode CoT --max_questions 1 --num_procs 2

To evaluate your results, use the code provided in compute_metrics.py: python compute_metrics.py

Baselines

Citation

If you use our dataset in your research, please cite it using the following

@misc{arora2023llms,
      title={Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models}, 
      author={Daman Arora and Himanshu Gaurav Singh and Mausam},
      year={2023},
      eprint={2305.15074},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

daman1209arora / jeebench

JEEBench(EMNLP 2023)

Dataset

Quick example

Baselines

Citation

About

Languages