Meta-Heuristic CNNs

(Utkarsh Mathur)

Team Name - Pixel Swarm
Course - CSE 573: Computer Vision and Image Processing (University at Buffalo, State University of New York)
Instructor - Dr. Sreyasee Das Bhattacharjee (TA - Bhushan Mahajan)
Project Title - Meta-Heuristics vs Backpropagation: A Fresh Look at CNN Model Parameter Optimization for Image Classification.

Project Members

Utkarsh Mathur (datamathur, umathur@buffalo.edu)
Mahammad Iqbal Shaik (iqbal-sk, mahammad@buffalo.edu)

1. Introduction

1.1. Abstract

Most of the Neural Network models are trained using gradient based backpropagation technique to optimize model parameters which are prone to be stuck at local optimal value rather than reaching the globally optimal parameters. There are various techniques to improve the simple Stochastic Gradient Descent (SGD) like Learning Rate Scheduling and Momentum, but these techniques does not resolve the aforementioned limitation. In this project, we aim to compare Backpropagation with Meta-Heuristic Optimization Algorithms by analyzing their performance on training CNN models for Image Classification tasks. There are a few population-based meta-heuristic algorithms like Particle Swarm Optimization (PSO) and Grey Wolf Optimization (GWO) which can achieve globally optimal parameters and so, we aim to check the feasibility of such optimization techniques in CNN model architectures.

1.2. Problem Statement

The aim of this project is to analyze the performances of meta-heuristic optimization algorithms in contrast to gradient-based backpropagation techniques for Convolutional Neural Network model training.

1.3. Method Details

Meta-Heuristic algorithms - Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Grey Wolf Optimization (GWO), and Ant Colony Optimization (ACO).
CNN model architectures - LeNet, AlexNet, VGG-16, and ResNet-50. As all the above three models have been trained on Image Classification task of the ImageNet challenge (ILSCRC), we’ll be using the classification task of ILSVRC 2017 as our dataset for model training and model inferencing.
Dataset - MNIST, CIFAR10, ILSVRC 2017 (Image Classification)

1.4. Project Analysis

Since we are trying to change the model optimization techniques we intend to perform analysis on the following two grounds:

Quality of model training – Based on the model inferencing performed on models trained with different optimization techniques, we will be comparing the qualities of trained model.
Computational Cost – Since most of the deep-CNN models are computationally expensive we will be analyzing the computational resources required by the novel optimization techniques in contrast to gradient-based backpropagation techniques.

2. Background

N/A

2.1. Meta-Heuristic Optimization

TO BE FILLED

General paradigm
Use cases
Advantages and Disadvantages

2.2. Genetic Algorithm

TO BE FILLED

Intuition
Algorithm
Reason for not using for Neural Network Optimization

2.3. Particle Swarm Optimization

TO BE FILLED

Intuition
Algorithm

2.4. Ant Colony Optimization

TO BE FILLED

Intuition
Algorithm
Reason for not using for Neural Network Optimization

2.5. Grey Wolf Optimization

TO BE FILLED

Intuition
Algorithm

2.6. Backpropagation & Gradient Descent

TO BE FILLED

Intuition
Algorithm
Advantages and Limitations

2.7. Convolutional Neural Networks

TO BE FILLED

Concept
Advantages and Limitations
Progress since 1998 (LeNet)

2.8. Image Classification

TO BE FILLED

Explain problem statement.
Standard datasets
Mention benchmarks

3. Methodology

Most of the evolutionary optimization algorithms can handle vector and tensors as their population. Model parameters for Deep Learning models usually consists of collections of 3D or 4D tensors of varying dimensionality. To address this we treat the collections parameters atomically (i.e. one convolution layer at a time).

3.1. Image Classification CNNs

N/A

3.2. Model Training

N/A

3.3. Model Inference

N/A

4. Experiments

N/A

4.1. MNIST LeNet

N/A

4.2. CIFAR-10 LeNet

N/A

4.3. ImageNet AlexNet

N/A

4.4. ImageNet VGG-16

N/A

4.5. ImageNet ResNet-50

N/A

5. Results & Analysis

N/A

5.1. Model Performances

N/A

5.2. Computation Analysis

N/A

6. References

Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996. doi: https://doi.org/10.1007/978-3-662-03315-9.‌
S. Katoch, S. S. Chauhan, and V. Kumar, “A review on genetic algorithm: past, present, and future,” Multimedia Tools and Applications, vol. 80, no. 5, Oct. 2020, doi: https://doi.org/10.1007/s11042-020-10139-6.
J. Kennedy and R. Eberhart, “Particle swarm optimization,” Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4, pp. 1942–1948, 1995, doi: https://doi.org/10.1109/icnn.1995.488968.
T. Alam, S. Qamar, A. Dixit, and M. Benaida, “Genetic Algorithm: Reviews, Implementations, and Applications,” Jun. 2020, doi: https://doi.org/10.48550/arxiv.2007.12673.
A. G. Gad, “Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review,” Archives of Computational Methods in Engineering, vol. 29, no. 5, pp. 2531–2561, Apr. 2022, doi: https://doi.org/10.1007/s11831-021-09694-4.
M. Dorigo, M. Birattari, and T. Stutzle, “Ant colony optimization,” IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28–39, Nov. 2006, doi: https://doi.org/10.1109/MCI.2006.329691.
S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Advances in Engineering Software, vol. 69, pp. 46–61, Mar. 2014, doi: https://doi.org/10.1016/j.advengsoft.2013.12.007.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, doi: https://doi.org/10.1109/cvpr.2009.5206848
Li Deng, “The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web],” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, Nov. 2012, doi: https://doi.org/10.1109/msp.2012.2211477.
A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” undefined, 2009, Accessed: May 11, 2022. [Online]. Available: https://www.semanticscholar.org/paper/Learning-Multiple-Layers-of-Features-from-Tiny-Krizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: https://doi.org/10.1109/5.726791.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2012, doi: https://doi.org/10.1145/3065386.
K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv.org, Apr. 10, 2015. https://arxiv.org/abs/1409.1556
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv.org, Dec. 10, 2015. https://arxiv.org/abs/1512.03385
R. Mohapatra, “rohanmohapatra/torchswarm,” GitHub, Apr. 20, 2024. https://github.com/rohanmohapatra/torchswarm (accessed Apr. 20, 2024).
A. P. Sansom, “Torch PSO,” GitHub, Aug. 01, 2022. https://github.com/qthequartermasterman/torch_pso (accessed Apr. 20, 2024).‌
H. Faris, “7ossam81/EvoloPy,” GitHub, Apr. 20, 2024. https://github.com/7ossam81/EvoloPy/tree/master (accessed Apr. 20, 2024).‌

datamathur / pixel_swarm_CV_project

Meta-Heuristic CNNs

Project Members

1. Introduction

1.1. Abstract

1.2. Problem Statement

1.3. Method Details

1.4. Project Analysis

2. Background

2.1. Meta-Heuristic Optimization

2.2. Genetic Algorithm

2.3. Particle Swarm Optimization

2.4. Ant Colony Optimization

2.5. Grey Wolf Optimization

2.6. Backpropagation & Gradient Descent

2.7. Convolutional Neural Networks

2.8. Image Classification

3. Methodology

3.1. Image Classification CNNs

3.2. Model Training

3.3. Model Inference

4. Experiments

4.1. MNIST LeNet

N/A

4.2. CIFAR-10 LeNet

4.3. ImageNet AlexNet

4.4. ImageNet VGG-16

4.5. ImageNet ResNet-50

5. Results & Analysis

5.1. Model Performances

5.2. Computation Analysis

6. References

About

Languages