SuperBruceJia / GSM8K-Consistency

GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.

Home Page:https://huggingface.co/datasets/shuyuej/GSM8K-Consistency

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GSM8K-Consistency Benchmark

GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.

πŸš€ The dataset is available on πŸ€— Hugging Face!

This is a math-problem-related semantics-preserving perturbation benchmark that can be very helpful for evaluating the consistency of arithmetic reasoning capability.

πŸ’» Dataset Usage

Run the following command to load the data:

from datasets import load_dataset

dataset = load_dataset("shuyuej/GSM8K-Consistency")
dataset = dataset['train']
print(dataset)

Dataset Description:

Dataset({
    features: ['id', 'original_question', 'paraphrased_question', 'answer_detail', 'numerical_answer'],
    num_rows: 85225
})