Customer coco dataset in self-training

Question

Customer coco dataset in self-training

tianyufang1958 opened this issue a year ago · comments

Thanks for the nice work. I have a question regarding the customer coco dataset used in self-training. For my coco data, I have instances_train.py and instances_val.py, and I registered two datasets for both train and val, but in the first step of self-training, --test-dataset only take the 'imagenet_train'.

Does it mean Imagenet only use one json file for both train and validation? Or json file generation of self-training can only be applied to training data itself not val data. I am confused about it.

XuDong Frank Wang · Answer 1 · Sun Apr 09 2023 01:21:31 GMT+0800 (China Standard Time)

Duplicate of #16. Please check #16 for more details on working with custom datasets.

About the self-training dataset, you can train CutLER on any dataset you specify. But you must let the model know which dataset/split to work on by changing the command accordingly.

BO LI · Answer 2 · Mon Apr 10 2023 23:32:46 GMT+0800 (China Standard Time)

Duplicate of #16. Please check #16 for more details on working with custom datasets.

About the self-training dataset, you can train CutLER on any dataset you specify. But you must let the model know which dataset/split to work on by changing the command accordingly.

@frank-xwang Sorry maybe my question is not clear.
i have split the dataset 80% and 20% in coco format and register as training and val dataset. For the command below, it is only for training dataset, should I also change to val dataset to generate pseudo labels as well? Just want to confirm this.

python maskcut.py
--vit-arch base --patch-size 8
--tau 0.15 --fixed_size 480 --N 3
--num-folder-per-job 1000 --job-index 0
--dataset-path /path/to/dataset/traindir
--out-dir /path/to/save/annotations \

XuDong Frank Wang · Answer 3 · Tue Apr 11 2023 00:40:22 GMT+0800 (China Standard Time)

If you plan to use pseudo-masks for your validation dataset, then it is necessary to provide the path to the dataset that contains the validation split using the "--dataset-path" argument.

BO LI · Answer 4 · Tue Apr 11 2023 03:57:13 GMT+0800 (China Standard Time)

If you plan to use pseudo-masks for your validation dataset, then it is necessary to provide the path to the dataset that contains the validation split using the "--dataset-path" argument.

@frank-xwang My understanding is firstly use the whole imaging dataset to generate the pseudo masks. After that the dataset can be splits into training and validation like 80% and 20% as the inputs of the phase 2 training. Could you please confirm if this is correct?

XuDong Frank Wang · Answer 5 · Wed Apr 12 2023 13:13:33 GMT+0800 (China Standard Time)

No, for self-training, we still utilize 100% of the data. Our experimental setup is: using all ImageNet data as the training set and evaluates the model's performance on 11 different detection datasets to demonstrate zero-shot unsupervised learning.

XuDong Frank Wang · Answer 6 · Fri Apr 14 2023 23:56:27 GMT+0800 (China Standard Time)

Closing it now, please feel free to reopen it if you have further questions.