Confusion on meta training/val/test split
hflserdaniel opened this issue · comments
Thanks for open-sourcing your project code!
To my understanding, this code base is largely based on DS-FSL. However, your meta training/val/test on the experiment datasets is different from the implementation in DS-FSL.
For example, in the 20-News setting, your split is:
val_classes = list(range(5)) train_classes = list(range(5, 13)) test_classes = list(range(13, 20))
but DS-FSL splits by the first-level labels:
train_classes = [] for key in label_dict.keys(): if key[:key.find('.')] in ['sci', 'rec']: train_classes.append(label_dict[key]) val_classes = [] for key in label_dict.keys(): if key[:key.find('.')] in ['comp']: val_classes.append(label_dict[key]) test_classes = [] for key in label_dict.keys(): if key[:key.find('.')] not in ['comp', 'sci', 'rec']: test_classes.append(label_dict[key])
I believe that different split on meta training/val/test will result in significantly different result. When your 20-News split is adopted on DS-FSL, the test accuracy is around 83.1, which is higher than your method (77.8). Similarly, there are also meta-split difference in Amazon and HuffPost.
Could you please respond to my confusion? Thanks
Thanks for your review!
You are right, different splits on meta training/val/test will result in significantly different results.
Cross-validation may be a more appropriate way to evaluate models.
We will update our code to strive for a fairer comparison.