hccngu / MLADA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confusion on meta training/val/test split

hflserdaniel opened this issue · comments

Thanks for open-sourcing your project code!
To my understanding, this code base is largely based on DS-FSL. However, your meta training/val/test on the experiment datasets is different from the implementation in DS-FSL.
For example, in the 20-News setting, your split is:

val_classes = list(range(5))
train_classes = list(range(5, 13))
test_classes = list(range(13, 20))

but DS-FSL splits by the first-level labels:

train_classes = []
for key in label_dict.keys():
    if key[:key.find('.')] in ['sci', 'rec']:
        train_classes.append(label_dict[key])

val_classes = []
for key in label_dict.keys():
    if key[:key.find('.')] in ['comp']:
        val_classes.append(label_dict[key])

test_classes = []
for key in label_dict.keys():
    if key[:key.find('.')] not in ['comp', 'sci', 'rec']:
        test_classes.append(label_dict[key])

I believe that different split on meta training/val/test will result in significantly different result. When your 20-News split is adopted on DS-FSL, the test accuracy is around 83.1, which is higher than your method (77.8). Similarly, there are also meta-split difference in Amazon and HuffPost.

Could you please respond to my confusion? Thanks

Thanks for your review!
You are right, different splits on meta training/val/test will result in significantly different results.
Cross-validation may be a more appropriate way to evaluate models.
We will update our code to strive for a fairer comparison.