IamShubhamGupto / ConvMixerXL

ECE-GY 7123 mini project at NYU

Home Page:https://iamshubhamgupto.github.io/ConvMixerXL/assets/pdf/ConvMixerXL.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Activations And Augmentations: Pushing The Performance Of Isotropic ConvNets

Abstract

Isotropic architectures have recently gained focus for solving computer vision problems for their ability to capture better spatial information. In this work, we experiment with training a ConvMixer model, an isotropic convolutional neural net architecture on the CIFAR-10 dataset. We propose a new architecture: ConvMixer-XL consisting of 66 layers and just under $5M$ parameters. To maximize its performance, various configurations of the architecture, augmentations and activations were tried in our ablation study to further fine-tune the model. Our experiments show applying augmentations and using the Swish (SiLU) activation function for deeper models gives the best results with a top-1 accuracy of 94.52%. Our code can be found at https://github.com/datacrisis/ConvMixerXL.

Results

Name Activation Depth Inter-Block Skip Augmentations #Params (M) Top 1 %Acc
CM-Vanilla-NoAug GELU 8 No No 0.59 0.8854
CM-Vanilla GELU 8 No Yes 0.59 0.9378
CM-Vanilla-ReLU ReLU 8 No Yes 0.59 0.9384
CM-Vanilla-SiLU SiLU 8 No Yes 0.59 0.9372
CM-XL-NoSkip GELU 66 No Yes 4.9 0.4868
CM-XL-Skip GELU 66 Yes Yes 4.9 0.9422
CM-XL SiLU 66 Yes Yes 4.9 0.9452

Weights and logs

We have uploaded all our experimentation logging and weights generated here.

Run

To reproduce the best performing configuration of ConvMixerXL, run the following code:

python3 train.py --lr-max=0.005 \
                  --depth=66\
                  --model='CM-XL'\
                  --activation='SiLU'\
                  --name='final_CMXL_SiLU'\
                  --save_dir='output/agg'\
                  --batch-size=128

References

This project is built based on ConvMixer CIFAR-10 and ConvMixer.