Hyperparameter setting for training from scratch on CIFAR-10

Question

Hyperparameter setting for training from scratch on CIFAR-10

Yuancheng-Xu opened this issue 2 years ago · comments

Hi,

I am trying to train a convext on CIFAR-10 for a research project that doesn't allow using BN. I use the following configuration:

python -m torch.distributed.launch --nproc_per_node=4 main.py \
  --data_set image_folder --data_path ./CIFAR-10-images/train --eval_data_path ./CIFAR-10-images/test \
  --nb_classes 10 --num_workers 8 --warmup_epochs 0 \
  --save_ckpt false \
  --cutmix 0 --mixup 0 \
  --model_ema_eval true \
  --model convnext_tiny \
  --epochs 100 --lr 4e-4 --weight_decay 5e-2 --opt 'sgd' --input_size 32\
  --output_dir results/100epochs_lr_4e-4_wd_5e-2_sgd_inputsize_32 \

And the accuracy is only 75% percent (standard ResNet18 is about 93%). If I change the optimizer from AdamW to SGD, the best accuracy actually drops to below 50%. If I use the default input size 224, the accuracy is 84%, still significantly low.

Can ConvNeXt work on CIFAR10 without fine-tuning from a pretrained model? Could you provide a recommended set of hyper parameters for CIFAR10 (that should be robust to different types of optimizers and without mix-up and cutmix)?

Also I have another question on fine-tuning on CIFAR10: it seems that in the colab file the input_size is the default 224. However CIFAR10 image is 32*32. Does this mean that in the data preparation stage the image will be padded to 224 * 224?

Thank you!

Sam Lerman · Answer 1 · Mon Dec 05 2022 23:04:54 GMT+0800 (China Standard Time)

I was also wondering about this. It seems the 32x32 size of CIFAR-10 is incompatible with this model due to the down-sampling layers.

Shamik Bose · Answer 2 · Tue May 02 2023 02:52:00 GMT+0800 (China Standard Time)

@Yuancheng-Xu It seems like it can. The downsampling layers should be set to a smaller kernel and stride size (2 and 2 respectively). Without this, the output of the downsampling layers is effectively the same size as the kernel.
In addition, you might want to choose a smaller kernel and padding size for the Block convolutional layers
Here's a notebook showing the training progress https://juliusruseckas.github.io/ml/convnext-cifar10.html

Shamik Bose · Answer 3 · Thu May 04 2023 23:30:00 GMT+0800 (China Standard Time)

@Yuancheng-Xu I managed to get accuracy to 87% by making a few changes to the code in the link above. Basic changes are mentioned in this repository https://github.com/shamikbose/Fujitsu_Assessment
Main changes were as follows:

The downsampling convolutional layers were modified (4x4 -> 2x2) for the smaller image size in the dataset

This improved accuracy from 70% to 80%

Keeping CIFAR-10 training recipes in mind, the architecture was modified to be a 3-block architecture instead of a 4-block one

This improved accuracy from 80% to 85%

Kernel size was changed (7 -> 3)

This improved accuracy from 85% to 87%

Yuancheng Xu · Answer 4 · Sat May 06 2023 00:51:59 GMT+0800 (China Standard Time)

Thanks a lot!

Shashank Priyadarshi · Answer 5 · Tue Jun 06 2023 05:46:26 GMT+0800 (China Standard Time)

Hey @shamikbose, I tried training the ImageNet100 dataset for custom input_size = 32, but the accuracy that I am getting is too low. What could I change in the architecture (I tried with making the kernel and stride small)? Any other approach that might help me to get good accuracy?

Shamik Bose · Answer 6 · Tue Jun 06 2023 05:48:52 GMT+0800 (China Standard Time)

@iamsh4shank The parameters used for ImageNet100 are mentioned in the paper. You should be able to reproduce it using those values.

Shashank Priyadarshi · Answer 7 · Tue Jun 06 2023 05:57:32 GMT+0800 (China Standard Time)

Actually ig it was for input_size 224 but on changing it to 32 I get accuracy really low

Shamik Bose · Answer 8 · Tue Jun 06 2023 06:11:48 GMT+0800 (China Standard Time)

With image size 32, try the parameters mentioned here #134 (comment)

Shashank Priyadarshi · Answer 9 · Tue Jun 06 2023 06:22:05 GMT+0800 (China Standard Time)

I did try changing the Conv layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L28) with kernel size 3 and padding 1. Also, I changed the downsampling layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L74) with kernel size 2 and stride 2. It did not change the accuracy much. I am getting test accuracy like 4-5 percent