Paper keypoints
the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting
using an architecture with very small (3 X 3) convolution filters, with stride 1
max-pooling is performed over a 2 × 2 pixel window, with stride 2
conv + 3 fully-connected layers (number of FC neurons: 4096 > 4096 > n_classes)
learning rate decay, parameter initializaiton from pre-trained models, etc. training:
load pre-trained parameters (trained on ImageNet dataset, 1000 classes), you can download the parameter file (vgg16.npy, about 500M) here:!YU1FWJrA!O1ywiCS2IiOlUCtCpI6HTJOMrneN-Qdv3ywQP5poecM
For Chinese users, I put the pre-trained parameter file (about 500M) on baidu:
Remove the final layer, add one layer with 10 nodes to test the CIFAR10 dataset(binary version).
It took me around one hour to train with 15000 training steps and learning rate is 0.01. The testing accuracy on the CIFAR10 test dataset is about 85.69%.