Loading MMLU does not give correct answers
nickypro opened this issue · comments
Nicky Pochinkov commented
Current Behaviour
When loading answers as given in the example, the labels are not correctly given. For example:
from tasksource import MultipleChoice
mmlu = MultipleChoice(
'question',
choices_list='choices',
labels='answer',
splits=['validation','dev','test'],
dataset_name='tasksource/mmlu',
config_name="high_school_computer_science",
)
dataset = mmlu.load()
for datum in dataset['test']:
print(datum)
break
Then, the output is as such, not containing the correct answer:
{'inputs': 'Let x = 1. What is x << 3 in Python 3?', 'labels': 0, 'choice0': '8', 'choice1': '1', 'choice2': '3', 'choice3': '16'}
Expected Behaviour
Gives the correct answer, i.e: the answers from huggingface:
from datasets import load_dataset
dataset = load_dataset("tasksource/mmlu", "high_school_computer_science")
for datum in dataset["test"]:
print(datum)
break
Leads to having the correct answer:
{'question': 'Let x = 1. What is x << 3 in Python 3?', 'choices': ['1', '3', '8', '16'], 'answer': 2}
sileod commented
Hi,
Thank you for your issue
Tasksource limits the number of options to 4 and reorder the options, but it adjust the label accordingly
The answer is 8 in both cases, isn't it ?
Nicky Pochinkov commented
My apologies, you are correct. I didn't notice that it re-ordered all the labels to have the answer as choice0.