facebookresearch / ConvNeXt

Code release for ConvNeXt model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hyperparameter setting for training from scratch on CIFAR-10

Yuancheng-Xu opened this issue · comments

Hi,

I am trying to train a convext on CIFAR-10 for a research project that doesn't allow using BN. I use the following configuration:

python -m torch.distributed.launch --nproc_per_node=4 main.py \
  --data_set image_folder --data_path ./CIFAR-10-images/train --eval_data_path ./CIFAR-10-images/test \
  --nb_classes 10 --num_workers 8 --warmup_epochs 0 \
  --save_ckpt false \
  --cutmix 0 --mixup 0 \
  --model_ema_eval true \
  --model convnext_tiny \
  --epochs 100 --lr 4e-4 --weight_decay 5e-2 --opt 'sgd' --input_size 32\
  --output_dir results/100epochs_lr_4e-4_wd_5e-2_sgd_inputsize_32 \

And the accuracy is only 75% percent (standard ResNet18 is about 93%). If I change the optimizer from AdamW to SGD, the best accuracy actually drops to below 50%. If I use the default input size 224, the accuracy is 84%, still significantly low.

Can ConvNeXt work on CIFAR10 without fine-tuning from a pretrained model? Could you provide a recommended set of hyper parameters for CIFAR10 (that should be robust to different types of optimizers and without mix-up and cutmix)?

Also I have another question on fine-tuning on CIFAR10: it seems that in the colab file the input_size is the default 224. However CIFAR10 image is 32*32. Does this mean that in the data preparation stage the image will be padded to 224 * 224?

Thank you!

I was also wondering about this. It seems the 32x32 size of CIFAR-10 is incompatible with this model due to the down-sampling layers.

@Yuancheng-Xu It seems like it can. The downsampling layers should be set to a smaller kernel and stride size (2 and 2 respectively). Without this, the output of the downsampling layers is effectively the same size as the kernel.
In addition, you might want to choose a smaller kernel and padding size for the Block convolutional layers
Here's a notebook showing the training progress https://juliusruseckas.github.io/ml/convnext-cifar10.html

@Yuancheng-Xu I managed to get accuracy to 87% by making a few changes to the code in the link above. Basic changes are mentioned in this repository https://github.com/shamikbose/Fujitsu_Assessment
Main changes were as follows:

  1. The downsampling convolutional layers were modified (4x4 -> 2x2) for the smaller image size in the dataset
  • This improved accuracy from 70% to 80%
  1. Keeping CIFAR-10 training recipes in mind, the architecture was modified to be a 3-block architecture instead of a 4-block one
  • This improved accuracy from 80% to 85%
  1. Kernel size was changed (7 -> 3)
  • This improved accuracy from 85% to 87%

Thanks a lot!

Hey @shamikbose, I tried training the ImageNet100 dataset for custom input_size = 32, but the accuracy that I am getting is too low. What could I change in the architecture (I tried with making the kernel and stride small)? Any other approach that might help me to get good accuracy?

@iamsh4shank The parameters used for ImageNet100 are mentioned in the paper. You should be able to reproduce it using those values.

Actually ig it was for input_size 224 but on changing it to 32 I get accuracy really low

With image size 32, try the parameters mentioned here #134 (comment)

I did try changing the Conv layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L28) with kernel size 3 and padding 1. Also, I changed the downsampling layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L74) with kernel size 2 and stride 2. It did not change the accuracy much. I am getting test accuracy like 4-5 percent