Game of Deep Learning - Approach Document

By Vedant Bassi

This competition is based on image classification on the ship images dataset provided by Analytics Vidya.

There are 5 classes overall.

'Cargo': 1, 'Military': 2, 'Carrier': 3, 'Cruise': 4, 'Tankers': 5

Cargo has 2120 images Military has 1167 images Carrier has 916 images Cruise has 832 images Tanker has 1217 images

Totally : 6252 images

Data Split - (88% in train 12% in validation) 5500 images in train 752 images in validation (As seen in the makeFolders.py file)

My Approach:

Use state of the art models pre-trained on Imagenet (Transfer Learning)
Use ensemble methods to aggregate predictions from different models to boost performance
Use Curriculum Learning with cyclical learning rate to finetune models and increase model accuracy

List of Models used:

ResNet 152 (https://arxiv.org/abs/1512.03385)
PNAS Net 5 Large (https://arxiv.org/abs/1712.00559)
ResNeXt-101 (https://arxiv.org/abs/1611.05431)
EfficentNet B3 (https://arxiv.org/abs/1905.11946)

All of the above models were pre-trained on ImageNet. The final Model is was an ensemble of the above 4 models.

Training Environment

Used Pytorch to train all my models and ran the code on a Deep Learning Server on Google Cloud. All trainings were done on a single Nvidia K80 GPU, with batch sizes varying from 32 to 64 Depending on the size of the model.

Validation metrics

Used the f1_score() and accuracy_score() from the sklearn.metrics module to get the weighted FI score on the validation set.

Cyclical Learning Rate Schedule

The learning rate schedule was based on Super Convergence , a concept taken from this paper: https://arxiv.org/abs/1708.07120

With this approach, the learning rate is slowly increased and then reaches a peak, and then keeps decreasing to a smaller value.

Pretrained Models & Where to find them

The PNAS NET model was downloaded from the Cadene repository (https://github.com/Cadene/pretrained-models.pytorch#pnasnet)

The ResNeXt and ResNet152 models was downloaded from the official Torchvision models repository. (https://github.com/pytorch/vision)

The model EfficentNet B3 was downloaded from the Luke Melas repository (https://github.com/lukemelas/EfficientNet-PyTorch)

Custom MLP Model Architecture

All pretrained models had a final output layer of size 1000. Did not freeze the pretrained models. This was followed by a custom 5 layer MLP designed by me.

Each layer was followed by a Batch Normalization layer Applied Parametric ReLU on each layer except the last two layers Input ~> 1024 ~> 512 ~> 256 ~> 128 ~> 5 ~> Softmax

Optimizer and Loss Function

Used Adam optimizer with starting learning rate of 0.00001 and then increasing up to 0.00012 with increments of x2 every 15 epochs then reducing it to 0.000001 over several epochs.

Weight decay was set to 0.0001

Used the nn.CrossEntropyLoss() function.

Trained each model to 80 epochs and used early stopping to avoid overfitting.

Ran training on the entire train set (of 6252 images) for 3/4 epochs to boost model performance.

The optimizer runs backward pass on both the Pre-Trained model and the MLP model in every step.

Data Augmentation and Preprocessing

Used the built-in Pytorch torchvision.transforms functions for Data Augmentation.

Here they are :

validation_transforms = transforms.Compose([
	transforms.Resize((331,331)),
	transforms.RandomHorizontalFlip(),
	transforms.ToTensor(),
	transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
		])


Training_transforms = transforms.Compose([
	transforms.Resize((331,331)),
	transforms.RandomHorizontalFlip(),
	transforms.ColorJitter(brightness=0.1,contrast=0.1,saturation=0.1, hue=0.1),
	transforms.RandomAffine(degrees=15, translate=(0.3,0.3), scale=(0.5,1.5), shear=None, resample=False, fillcolor=0),
	transforms.ToTensor(),
	transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])	
		])

Normalized all inputs with mean = [0.5 0.5 0.5] and standard deviation = [0.5 0.5 0.5] This helped in better model training, probably because the pretrained models were also trained with normalized inputs.

Ensembling Predictions

The outputs the final layer from each model were taken and added. Then applied an argmax to find the index of the max value and that was considered as the class for the final prediction.

xbassi / game_of_deep_learning