rasmusbergpalm / DeepLearnToolbox

Matlab/Octave toolbox for deep learning. Includes Deep Belief Nets, Stacked Autoencoders, Convolutional Neural Nets, Convolutional Autoencoders and vanilla Neural Nets. Each method has examples to get you started.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How CNN parameters depends on input image size?

mrgloom opened this issue · comments

I'm trying to modify example test_example_CNN.m to work with my images.
I have some pedestrian detection dataset where I have two classes positive - pedestrians and negative - background , images are 128*64 size, when I try to run code without changes error increases(!), but when I tried to resize images to 28x28 it worked.

So my question is how CNN parameters depens on image size?

commented

Same here. Is there a documentation for configuring the CNN?

Try smaller learning rate. Usually you try learning rate in powers of 10, i.e. 0.1, 0.01, 0.001 and so on. Pick the first one, that makes your loss to decrease. Choosing good hyperparameters for deep networks is still an art, you can find few rules of thumb in these articles:

commented

Thanks for the information. However I was interested in how to set up the structure of CNN here: https://github.com/rasmusbergpalm/DeepLearnToolbox/blob/master/tests/test_example_CNN.m#L15-L21

I would start with some well-known architecture. CIFAR-10 examples are good start, if your images are not too big. Otherwise AlexNet, but AlexNet is way too big for DeepLearnToolbox to handle.

For example CIFAR-10 network in Caffe examples has worked well for me:
https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_quick_train_test.prototxt
Hopefully you can figure out the layer parameters from all this prototxt cruft.

I found this formula in Andrej Karpathy's CNN course and it worked for me:
(it's really simple after a while of thinking)

It assumes square images, vertical stride equals horizontal stride and a square kernel_size!

in_channels = 3 # nearly always, because image has 3 channels (3 matrices -> red, green, blue)
out_channels = (image_width - kernel_size + 2*padding) / stride + 1

# if you don't know what these variables mean, google it -> these are the basics of CNN

in_channels and out_channels are the parameters for one convolution layer, but each following layer's in_channels equals to number of out_channels from the previous one.