weiaicunzai / pytorch-cifar100

Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet, NasNet, Residual Attention Network, SENet, WideResNet)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

学习率是增加的?

wuzuowuyou opened this issue · comments

raining Epoch: 1 [3968/8000] Loss: 1.3380 LR: 0.004960
Training Epoch: 1 [3976/8000] Loss: 0.3088 LR: 0.004970
Training Epoch: 1 [3984/8000] Loss: 0.6474 LR: 0.004980
Training Epoch: 1 [3992/8000] Loss: 0.4500 LR: 0.004990
Training Epoch: 1 [4000/8000] Loss: 0.6452 LR: 0.005000
Training Epoch: 1 [4008/8000] Loss: 0.9984 LR: 0.005010
Training Epoch: 1 [4016/8000] Loss: 0.7139 LR: 0.005020
Training Epoch: 1 [4024/8000] Loss: 0.6220 LR: 0.005030
Training Epoch: 1 [4032/8000] Loss: 0.4329 LR: 0.005040
Training Epoch: 1 [4040/8000] Loss: 0.4127 LR: 0.005050
Training Epoch: 1 [4048/8000] Loss: 0.4696 LR: 0.005060
Training Epoch: 1 [4056/8000] Loss: 0.5181 LR: 0.005070
Training Epoch: 1 [4064/8000] Loss: 0.4105 LR: 0.005080
Training Epoch: 1 [4072/8000] Loss: 0.7041 LR: 0.005090
Training Epoch: 1 [4080/8000] Loss: 0.3864 LR: 0.005100
Training Epoch: 1 [4088/8000] Loss: 0.6991 LR: 0.005110
Training Epoch: 1 [4096/8000] Loss: 0.3007 LR: 0.005120
Training Epoch: 1 [4104/8000] Loss: 0.3111 LR: 0.005130
Training Epoch: 1 [4112/8000] Loss: 0.3763 LR: 0.005140
Training Epoch: 1 [4120/8000] Loss: 0.5825 LR: 0.005150
Training Epoch: 1 [4128/8000] Loss: 0.5528 LR: 0.005160
Training Epoch: 1 [4136/8000] Loss: 0.3553 LR: 0.005170
Training Epoch: 1 [4144/8000] Loss: 0.2654 LR: 0.005180
Training Epoch: 1 [4152/8000] Loss: 0.3935 LR: 0.005190
Training Epoch: 1 [4160/8000] Loss: 0.2935 LR: 0.005200
Training Epoch: 1 [4168/8000] Loss: 0.2382 LR: 0.005210
Training Epoch: 1 [4176/8000] Loss: 0.2893 LR: 0.005220

I think it is because of warm up

According to paper: Deep Residual Learning for Image Recognition:

So we use 0.01 to warm up the training until the training error is below
80% (about 400 iterations), and then go back to 0.1 and con-
tinue training.

Yolo also used this trick to train the network.