sileod / tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading MMLU does not give correct answers

nickypro opened this issue · comments

Current Behaviour

When loading answers as given in the example, the labels are not correctly given. For example:

from tasksource import MultipleChoice

mmlu = MultipleChoice(
    'question',
    choices_list='choices',
    labels='answer',
    splits=['validation','dev','test'],
    dataset_name='tasksource/mmlu',
    config_name="high_school_computer_science",
)

dataset = mmlu.load()

for datum in dataset['test']:
    print(datum)
    break

Then, the output is as such, not containing the correct answer:

{'inputs': 'Let x = 1. What is x << 3 in Python 3?', 'labels': 0, 'choice0': '8', 'choice1': '1', 'choice2': '3', 'choice3': '16'}

Expected Behaviour

Gives the correct answer, i.e: the answers from huggingface:

from datasets import load_dataset

dataset = load_dataset("tasksource/mmlu", "high_school_computer_science")

for datum in dataset["test"]:
    print(datum)
    break

Leads to having the correct answer:

{'question': 'Let x = 1. What is x << 3 in Python 3?', 'choices': ['1', '3', '8', '16'], 'answer': 2}
commented

Hi,
Thank you for your issue
Tasksource limits the number of options to 4 and reorder the options, but it adjust the label accordingly
The answer is 8 in both cases, isn't it ?

My apologies, you are correct. I didn't notice that it re-ordered all the labels to have the answer as choice0.