super_glue/multirc is bugged

Question

super_glue/multirc is bugged

A1exRey opened this issue a year ago · comments

Hi, thanks for the great collection of datasets.
But it seems that not all datasets in it are correctly preprocessed. Multirc requires paragraph, question, individual answers concatenated together for a classification. But in your case you just take the first sentence (the question itself) without adding more data. In taks.py
super_glue___multirc = Classification(sentence1="question", labels="label")
And during load we get:

from tasksource import list_tasks, load_task
ddf = load_task('super_glue/multirc')

index	sentence1	labels
0	What did the high-level effort to persuade Pakistan include?	0
1	What did the high-level effort to persuade Pakistan include?	0
2	What did the high-level effort to persuade Pakistan include?	1
3	What did the high-level effort to persuade Pakistan include?	1
4	What did the high-level effort to persuade Pakistan include?	1

This data does not make any sense, and model will not be trained in any way.
Maybe you should replace the code with something similar to this to put all the data together(following the WiC example).

super_glue___multirc = Classification( 
     sentence1=cat(["paragraph", "question","answer"], " : "),
    labels='label'
)

sileod · Answer 1 · Wed Apr 05 2023 21:32:34 GMT+0800 (China Standard Time)

Hi, thanks for the great collection of datasets. But it seems that not all datasets in it are correctly preprocessed. Multirc requires paragraph, question, individual answers concatenated together for a classification. But in your case you just take the first sentence (the question itself) without adding more data. In taks.py super_glue___multirc = Classification(sentence1="question", labels="label") And during load we get:
from tasksource import list_tasks, load_task
ddf = load_task('super_glue/multirc')
index sentence1 labels
0 What did the high-level effort to persuade Pakistan include? 0
1 What did the high-level effort to persuade Pakistan include? 0
2 What did the high-level effort to persuade Pakistan include? 1
3 What did the high-level effort to persuade Pakistan include? 1
4 What did the high-level effort to persuade Pakistan include? 1
This data does not make any sense, and model will not be trained in any way. Maybe you should replace the code with something similar to this to put all the data together(following the WiC example).
super_glue___multirc = Classification( 
     sentence1=cat(["paragraph", "question","answer"], " : "),
    labels='label'
)

I apologize for that mistake. I manually check the processed datasets (and I also trained models on them) but there might be some errors I overlooked. The last release fixes that mistake. Thanks a lot for your input.