Dataset split in SSL and evaluation

Question

Dataset split in SSL and evaluation

caiocj1 opened this issue a year ago · comments

Hello, thanks for the great work!

Just wanted to check something:

In the appendix it's said that "we incorporate the train and validation set of the original benchmarks [...] to enlarge the number of training samples".
In the paper itself, Table 2 shows the size of the training set.

So when doing SSL and then evaluating just on aircraft for example (the 46.56% value), you use all 10k samples in SSL, and in table 2 you report the accuracy on the training set after training the linear classifier? Thanks in advance!

Sangmin Bae · Answer 1 · Thu Jun 22 2023 14:35:57 GMT+0800 (China Standard Time)

Hi @caiocj1.
Thanks for your interest in our paper!

Only if a fine-grained dataset has three splits (train, val, test), we combined train + val -> train dataset.
We run SSL on train dataset (for SimCore: train_X + Coreset), and then evaluate on test dataset (or val dataset if fine-grained dataset has two split: train, val).

For example, in case of Aircraft, it has three splits, so we combine them to (train+val: 6,667 , test: 3,333).
We run SSL pretraining on 6,667 samples, and train a linear classifier on the same 6,667 samples.
A trained classifier is linear evaluated on 3,333 test samples.

Feel free to ask if you have more questions on our work!
Thanks.

Sungnyun Kim · Answer 2 · Fri Jun 23 2023 06:12:18 GMT+0800 (China Standard Time)

Feel free to ask if further needs exist :)

Sangmin Bae · Answer 3 · Thu Jul 20 2023 17:06:37 GMT+0800 (China Standard Time)

If you have any further questions, please reopen the issue or open new one.
Thanks again for your interests!!