Dataset split in SSL and evaluation
caiocj1 opened this issue · comments
Hello, thanks for the great work!
Just wanted to check something:
- In the appendix it's said that "we incorporate the train and validation set of the original benchmarks [...] to enlarge the number of training samples".
- In the paper itself, Table 2 shows the size of the training set.
So when doing SSL and then evaluating just on aircraft for example (the 46.56% value), you use all 10k samples in SSL, and in table 2 you report the accuracy on the training set after training the linear classifier? Thanks in advance!
Hi @caiocj1.
Thanks for your interest in our paper!
Only if a fine-grained dataset has three splits (train, val, test), we combined train + val -> train dataset.
We run SSL on train dataset (for SimCore: train_X + Coreset), and then evaluate on test dataset (or val dataset if fine-grained dataset has two split: train, val).
For example, in case of Aircraft, it has three splits, so we combine them to (train+val: 6,667 , test: 3,333).
We run SSL pretraining on 6,667 samples, and train a linear classifier on the same 6,667 samples.
A trained classifier is linear evaluated on 3,333 test samples.
Feel free to ask if you have more questions on our work!
Thanks.
Feel free to ask if further needs exist :)
If you have any further questions, please reopen the issue or open new one.
Thanks again for your interests!!