huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Home Page:https://huggingface.co/docs/timm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] significant performance discrepancy

yuxiangwei0808 opened this issue · comments

I am using huggingface model zoo to reproduce the standard image classification task based on Cifar10. However, I found the huggingface's implementations, although much faster, have much worse performances compared to some common implementations.
I use timm.create_model("MODEL_NAME") to get the model without any pertaining and do the training, below are my training script and implementation of resnet18 . Similar phenomenon can also be witnessed with other models.

To Reproduce
Training scipt:

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms

import os
import argparse
import timm

from models import *
from utils import progress_bar


parser = argparse.ArgumentParser(description='PyTorch CIFAR10 Training')
parser.add_argument('--lr', default=0.1, type=float, help='learning rate')
parser.add_argument('--resume', '-r', action='store_true',
                    help='resume from checkpoint')
args = parser.parse_args()

device = 'cuda' if torch.cuda.is_available() else 'cpu'
best_acc = 0  # best test accuracy
start_epoch = 0  # start from epoch 0 or last checkpoint epoch

# Data
print('==> Preparing data..')
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=100, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

# Model
print('==> Building model..')
# net = ResNet18()
net = timm.create_model('resnet18', in_chans=3, num_classes=10)
net = net.to(device)
if device == 'cuda':
    cudnn.benchmark = True

criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(net.parameters(), lr=0.001, weight_decay=0.05)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5, eta_min=1e-5)


# Training
def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                     % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))


def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                         % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

    # Save checkpoint.
    acc = 100.*correct/total
    if acc > best_acc:
        print('best acc', acc)
        best_acc = acc


for epoch in range(start_epoch, start_epoch+200):
    train(epoch)
    test(epoch)
    scheduler.step()

Resnet18

'''ResNet in PyTorch.

For Pre-activation ResNet, see 'preact_resnet.py'.

Reference:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
    Deep Residual Learning for Image Recognition. arXiv:1512.03385
'''
import torch
import torch.nn as nn
import torch.nn.functional as F


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion *
                               planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])


def ResNet34():
    return ResNet(BasicBlock, [3, 4, 6, 3])


def ResNet50():
    return ResNet(Bottleneck, [3, 4, 6, 3])


def ResNet101():
    return ResNet(Bottleneck, [3, 4, 23, 3])


def ResNet152():
    return ResNet(Bottleneck, [3, 8, 36, 3])

Desktop:

  • OS: [Ubuntu 18.04
  • timm.version='0.9.8'
  • PyTorch 2.1.0 w/ CUDA/cuDNN

Below are training logs after setting all seeds to 0

Using huggingface's implementation:

Epoch: 0
 [=========================== 391/391 ============================>]  Step: 196ms | Tot: 3s656ms | Loss: 1.609 | Acc: 40.126% (20063/50000)         
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 416ms | Loss: 1.301 | Acc: 52.950% (5295/10000)              
best acc 52.95

Epoch: 1
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s647ms | Loss: 1.209 | Acc: 56.290% (28145/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 421ms | Loss: 1.060 | Acc: 62.120% (6212/10000)              
best acc 62.12

Epoch: 2
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s547ms | Loss: 1.019 | Acc: 63.532% (31766/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 425ms | Loss: 1.003 | Acc: 64.660% (6466/10000)              
best acc 64.66

Epoch: 3
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s346ms | Loss: 0.896 | Acc: 68.072% (34036/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 421ms | Loss: 0.840 | Acc: 70.210% (7021/10000)              
best acc 70.21

Epoch: 4
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s361ms | Loss: 0.804 | Acc: 71.540% (35770/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 434ms | Loss: 0.776 | Acc: 72.380% (7238/10000)              
best acc 72.38

Epoch: 5
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s360ms | Loss: 0.778 | Acc: 72.482% (36241/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 517ms | Loss: 0.763 | Acc: 72.780% (7278/10000)              
best acc 72.78

Epoch: 6
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 4s90ms | Loss: 0.778 | Acc: 72.338% (36169/50000)            
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 507ms | Loss: 0.760 | Acc: 72.750% (7275/10000)              

Epoch: 7
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s361ms | Loss: 0.805 | Acc: 71.250% (35625/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 571ms | Loss: 0.770 | Acc: 72.880% (7288/10000)              
best acc 72.88

Epoch: 8
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s988ms | Loss: 0.826 | Acc: 70.750% (35375/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 427ms | Loss: 0.778 | Acc: 72.310% (7231/10000)              

Epoch: 9
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s382ms | Loss: 0.818 | Acc: 70.888% (35444/50000)           
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 421ms | Loss: 0.825 | Acc: 70.500% (7050/10000)              

Epoch: 10
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s373ms | Loss: 0.794 | Acc: 71.966% (35983/50000)           
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 423ms | Loss: 0.808 | Acc: 72.420% (7242/10000)              

Epoch: 11
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s383ms | Loss: 0.733 | Acc: 74.008% (37004/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 425ms | Loss: 0.780 | Acc: 73.470% (7347/10000)              
best acc 73.47

Epoch: 12
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s996ms | Loss: 0.669 | Acc: 76.394% (38197/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 421ms | Loss: 0.698 | Acc: 76.030% (7603/10000)              
best acc 76.03

Epoch: 13
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s561ms | Loss: 0.588 | Acc: 79.280% (39640/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 424ms | Loss: 0.637 | Acc: 77.910% (7791/10000)              
best acc 77.91

Epoch: 14
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s499ms | Loss: 0.531 | Acc: 81.202% (40601/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 432ms | Loss: 0.581 | Acc: 80.180% (8018/10000)              
best acc 80.18

Epoch: 15
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s990ms | Loss: 0.504 | Acc: 82.344% (41172/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 431ms | Loss: 0.571 | Acc: 80.600% (8060/10000)              
best acc 80.6

Epoch: 16
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s641ms | Loss: 0.513 | Acc: 81.876% (40938/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 433ms | Loss: 0.570 | Acc: 80.660% (8066/10000)              
best acc 80.66

Epoch: 17
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s354ms | Loss: 0.543 | Acc: 80.994% (40497/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 423ms | Loss: 0.605 | Acc: 78.940% (7894/10000)              

Epoch: 18
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s356ms | Loss: 0.586 | Acc: 79.434% (39717/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 429ms | Loss: 0.664 | Acc: 77.480% (7748/10000)              

Epoch: 19
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s371ms | Loss: 0.615 | Acc: 78.328% (39164/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 421ms | Loss: 0.633 | Acc: 77.880% (7788/10000)              

Epoch: 20
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s364ms | Loss: 0.610 | Acc: 78.492% (39246/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 424ms | Loss: 0.648 | Acc: 77.700% (7770/10000)              

Epoch: 21
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s364ms | Loss: 0.569 | Acc: 79.850% (39925/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 421ms | Loss: 0.624 | Acc: 79.040% (7904/10000)              

Epoch: 22
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s387ms | Loss: 0.517 | Acc: 81.786% (40893/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 432ms | Loss: 0.579 | Acc: 80.410% (8041/10000)              

Epoch: 23
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s452ms | Loss: 0.444 | Acc: 84.396% (42198/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 429ms | Loss: 0.553 | Acc: 81.310% (8131/10000)              
best acc 81.31

Epoch: 24
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s423ms | Loss: 0.393 | Acc: 86.196% (43098/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 422ms | Loss: 0.525 | Acc: 82.470% (8247/10000)              
best acc 82.47

Epoch: 25
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s608ms | Loss: 0.368 | Acc: 86.992% (43496/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 532ms | Loss: 0.518 | Acc: 82.780% (8278/10000)              
best acc 82.78

Epoch: 26
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s501ms | Loss: 0.378 | Acc: 86.708% (43354/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 425ms | Loss: 0.523 | Acc: 82.670% (8267/10000)              

Epoch: 27
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s374ms | Loss: 0.406 | Acc: 85.704% (42852/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 422ms | Loss: 0.564 | Acc: 80.960% (8096/10000)              

Epoch: 28
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s356ms | Loss: 0.457 | Acc: 83.838% (41919/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 590ms | Loss: 0.608 | Acc: 79.880% (7988/10000)              

Epoch: 29
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s375ms | Loss: 0.498 | Acc: 82.496% (41248/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 428ms | Loss: 0.644 | Acc: 78.100% (7810/10000)              

Epoch: 30
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s985ms | Loss: 0.505 | Acc: 82.094% (41047/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 529ms | Loss: 0.648 | Acc: 78.800% (7880/10000)              

Epoch: 31
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s386ms | Loss: 0.477 | Acc: 83.124% (41562/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 555ms | Loss: 0.596 | Acc: 80.540% (8054/10000)              

Epoch: 32
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s351ms | Loss: 0.418 | Acc: 85.270% (42635/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 423ms | Loss: 0.561 | Acc: 81.630% (8163/10000)              

Epoch: 33
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s365ms | Loss: 0.355 | Acc: 87.528% (43764/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 432ms | Loss: 0.540 | Acc: 82.490% (8249/10000)              

Epoch: 34
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s454ms | Loss: 0.310 | Acc: 89.046% (44523/50000)           
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 422ms | Loss: 0.519 | Acc: 83.260% (8326/10000)              
best acc 83.26

Epoch: 35
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s358ms | Loss: 0.284 | Acc: 90.014% (45007/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 423ms | Loss: 0.518 | Acc: 83.250% (8325/10000)              

Epoch: 36
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s364ms | Loss: 0.291 | Acc: 89.930% (44965/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 422ms | Loss: 0.521 | Acc: 83.380% (8338/10000)              
best acc 83.38

Epoch: 37
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s371ms | Loss: 0.321 | Acc: 88.696% (44348/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 503ms | Loss: 0.575 | Acc: 81.820% (8182/10000)              

Epoch: 38
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s397ms | Loss: 0.375 | Acc: 86.810% (43405/50000)           
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 540ms | Loss: 0.588 | Acc: 81.140% (8114/10000)              

Epoch: 39
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s629ms | Loss: 0.422 | Acc: 84.936% (42468/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 426ms | Loss: 0.599 | Acc: 80.490% (8049/10000)              

Epoch: 40
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s417ms | Loss: 0.424 | Acc: 84.918% (42459/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 422ms | Loss: 0.599 | Acc: 80.840% (8084/10000)              

Epoch: 41
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s398ms | Loss: 0.402 | Acc: 85.926% (42963/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 422ms | Loss: 0.612 | Acc: 80.580% (8058/10000)              

Epoch: 42
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s387ms | Loss: 0.352 | Acc: 87.420% (43710/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 421ms | Loss: 0.562 | Acc: 82.440% (8244/10000)              

Epoch: 43
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s468ms | Loss: 0.291 | Acc: 89.628% (44814/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 425ms | Loss: 0.544 | Acc: 83.270% (8327/10000)              

Epoch: 44
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s359ms | Loss: 0.241 | Acc: 91.486% (45743/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 432ms | Loss: 0.513 | Acc: 84.270% (8427/10000)              
best acc 84.27

Epoch: 45
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s357ms | Loss: 0.226 | Acc: 92.136% (46068/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 425ms | Loss: 0.512 | Acc: 84.300% (8430/10000)              
best acc 84.3

Epoch: 46
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s363ms | Loss: 0.221 | Acc: 92.350% (46175/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 419ms | Loss: 0.531 | Acc: 84.130% (8413/10000)              

Epoch: 47
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s378ms | Loss: 0.256 | Acc: 91.080% (45540/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 422ms | Loss: 0.556 | Acc: 83.480% (8348/10000)              

Epoch: 48
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s348ms | Loss: 0.312 | Acc: 88.984% (44492/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 429ms | Loss: 0.607 | Acc: 81.360% (8136/10000)              

Epoch: 49
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s366ms | Loss: 0.361 | Acc: 87.304% (43652/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 426ms | Loss: 0.617 | Acc: 80.660% (8066/10000)              

Epoch: 50
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s355ms | Loss: 0.376 | Acc: 86.652% (43326/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 432ms | Loss: 0.589 | Acc: 81.230% (8123/10000)              

Epoch: 51
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s984ms | Loss: 0.349 | Acc: 87.562% (43781/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 430ms | Loss: 0.580 | Acc: 81.700% (8170/10000)              

Epoch: 52
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s598ms | Loss: 0.301 | Acc: 89.334% (44667/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 420ms | Loss: 0.595 | Acc: 82.030% (8203/10000)              

Epoch: 53
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s345ms | Loss: 0.241 | Acc: 91.510% (45755/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 441ms | Loss: 0.553 | Acc: 83.460% (8346/10000)              

Epoch: 54
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s657ms | Loss: 0.198 | Acc: 93.094% (46547/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 421ms | Loss: 0.531 | Acc: 84.190% (8419/10000)              

Epoch: 55
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s920ms | Loss: 0.174 | Acc: 93.950% (46975/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 423ms | Loss: 0.529 | Acc: 84.350% (8435/10000)              
best acc 84.35

Epoch: 56
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s962ms | Loss: 0.180 | Acc: 93.724% (46862/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 470ms | Loss: 0.544 | Acc: 84.250% (8425/10000)              

Epoch: 57
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s353ms | Loss: 0.208 | Acc: 92.562% (46281/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 521ms | Loss: 0.577 | Acc: 83.650% (8365/10000)              

Epoch: 58
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s379ms | Loss: 0.273 | Acc: 90.278% (45139/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 503ms | Loss: 0.607 | Acc: 82.000% (8200/10000)              

Epoch: 59
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s352ms | Loss: 0.324 | Acc: 88.452% (44226/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 537ms | Loss: 0.621 | Acc: 80.990% (8099/10000)              

Epoch: 60
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s767ms | Loss: 0.326 | Acc: 88.452% (44226/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 524ms | Loss: 0.652 | Acc: 80.750% (8075/10000)              

Epoch: 61
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s810ms | Loss: 0.310 | Acc: 89.036% (44518/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 420ms | Loss: 0.645 | Acc: 80.620% (8062/10000)              

Epoch: 62
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s342ms | Loss: 0.264 | Acc: 90.730% (45365/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 425ms | Loss: 0.586 | Acc: 82.340% (8234/10000)              

Epoch: 63
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s345ms | Loss: 0.204 | Acc: 92.760% (46380/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 424ms | Loss: 0.575 | Acc: 83.630% (8363/10000)              

Epoch: 64
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s458ms | Loss: 0.161 | Acc: 94.540% (47270/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 579ms | Loss: 0.545 | Acc: 84.510% (8451/10000)              
best acc 84.51

Epoch: 65
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s384ms | Loss: 0.141 | Acc: 95.086% (47543/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 426ms | Loss: 0.541 | Acc: 84.730% (8473/10000)              
best acc 84.73

Epoch: 66
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s377ms | Loss: 0.144 | Acc: 94.980% (47490/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 422ms | Loss: 0.552 | Acc: 84.840% (8484/10000)              
best acc 84.84

Epoch: 67
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s360ms | Loss: 0.173 | Acc: 93.786% (46893/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 441ms | Loss: 0.584 | Acc: 84.150% (8415/10000)              

Epoch: 68
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s405ms | Loss: 0.240 | Acc: 91.370% (45685/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 424ms | Loss: 0.624 | Acc: 82.000% (8200/10000)              

Epoch: 69
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s364ms | Loss: 0.289 | Acc: 89.658% (44829/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 420ms | Loss: 0.612 | Acc: 81.640% (8164/10000)              

Epoch: 70
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s364ms | Loss: 0.302 | Acc: 89.114% (44557/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 429ms | Loss: 0.686 | Acc: 80.120% (8012/10000)              

Epoch: 71
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s445ms | Loss: 0.284 | Acc: 89.908% (44954/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 424ms | Loss: 0.635 | Acc: 81.720% (8172/10000)              

Epoch: 72
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s759ms | Loss: 0.229 | Acc: 91.744% (45872/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 536ms | Loss: 0.614 | Acc: 83.250% (8325/10000)              

Epoch: 73
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s364ms | Loss: 0.174 | Acc: 93.930% (46965/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 419ms | Loss: 0.594 | Acc: 83.830% (8383/10000)              

Epoch: 74
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s512ms | Loss: 0.134 | Acc: 95.306% (47653/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 434ms | Loss: 0.561 | Acc: 84.700% (8470/10000)              

Epoch: 75
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s375ms | Loss: 0.115 | Acc: 95.996% (47998/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 425ms | Loss: 0.562 | Acc: 84.660% (8466/10000)              

Epoch: 76
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s454ms | Loss: 0.121 | Acc: 95.876% (47938/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 501ms | Loss: 0.577 | Acc: 84.610% (8461/10000)              

Epoch: 77
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s353ms | Loss: 0.146 | Acc: 94.734% (47367/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 433ms | Loss: 0.621 | Acc: 83.780% (8378/10000)              

Epoch: 78
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s358ms | Loss: 0.212 | Acc: 92.354% (46177/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 449ms | Loss: 0.648 | Acc: 82.370% (8237/10000)              

Epoch: 79
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s359ms | Loss: 0.265 | Acc: 90.446% (45223/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 420ms | Loss: 0.641 | Acc: 81.580% (8158/10000)              

Epoch: 80
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s360ms | Loss: 0.282 | Acc: 89.694% (44847/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 434ms | Loss: 0.651 | Acc: 81.010% (8101/10000)              

Epoch: 81
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s672ms | Loss: 0.259 | Acc: 90.750% (45375/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 423ms | Loss: 0.615 | Acc: 82.420% (8242/10000)              

Epoch: 82
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s354ms | Loss: 0.203 | Acc: 92.898% (46449/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 520ms | Loss: 0.604 | Acc: 83.360% (8336/10000)              

Epoch: 83
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s517ms | Loss: 0.147 | Acc: 94.662% (47331/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 422ms | Loss: 0.605 | Acc: 84.300% (8430/10000)              

Epoch: 84
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s347ms | Loss: 0.111 | Acc: 96.228% (48114/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 425ms | Loss: 0.583 | Acc: 85.050% (8505/10000)              
best acc 85.05

Epoch: 85
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s970ms | Loss: 0.097 | Acc: 96.806% (48403/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 425ms | Loss: 0.582 | Acc: 84.860% (8486/10000)              

Epoch: 86
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s553ms | Loss: 0.099 | Acc: 96.524% (48262/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 424ms | Loss: 0.599 | Acc: 84.720% (8472/10000)              

Epoch: 87
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s354ms | Loss: 0.128 | Acc: 95.500% (47750/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 424ms | Loss: 0.635 | Acc: 83.940% (8394/10000)              

Epoch: 88
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s860ms | Loss: 0.196 | Acc: 93.074% (46537/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 556ms | Loss: 0.653 | Acc: 82.640% (8264/10000)              

Epoch: 89
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s987ms | Loss: 0.241 | Acc: 91.354% (45677/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 426ms | Loss: 0.630 | Acc: 82.680% (8268/10000)              

Epoch: 90
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s964ms | Loss: 0.263 | Acc: 90.568% (45284/50000)           
 [=========================== 100/100 ============================>]  Step: 2ms | Tot: 420ms | Loss: 0.634 | Acc: 81.690% (8169/10000)              

Epoch: 91
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s376ms | Loss: 0.237 | Acc: 91.590% (45795/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 420ms | Loss: 0.662 | Acc: 81.870% (8187/10000)              

Epoch: 92
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s500ms | Loss: 0.185 | Acc: 93.456% (46728/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 425ms | Loss: 0.641 | Acc: 83.100% (8310/10000)              

Epoch: 93
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s370ms | Loss: 0.137 | Acc: 95.192% (47596/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 420ms | Loss: 0.615 | Acc: 83.970% (8397/10000)              

Epoch: 94
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s381ms | Loss: 0.100 | Acc: 96.576% (48288/50000)           
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 427ms | Loss: 0.594 | Acc: 84.800% (8480/10000)              

Epoch: 95
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s380ms | Loss: 0.086 | Acc: 97.102% (48551/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 530ms | Loss: 0.590 | Acc: 85.080% (8508/10000)              
best acc 85.08

Epoch: 96
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s373ms | Loss: 0.085 | Acc: 97.112% (48556/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 421ms | Loss: 0.611 | Acc: 84.850% (8485/10000)              

Epoch: 97
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s356ms | Loss: 0.109 | Acc: 96.116% (48058/50000)           
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 422ms | Loss: 0.667 | Acc: 83.900% (8390/10000)              

Epoch: 98
 [=========================== 391/391 ============================>]  Step: 6ms | Tot: 3s354ms | Loss: 0.186 | Acc: 93.300% (46650/50000)           
 [=========================== 100/100 ============================>]  Step: 6ms | Tot: 434ms | Loss: 0.658 | Acc: 82.800% (8280/10000)              

Epoch: 99
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s347ms | Loss: 0.230 | Acc: 91.770% (45885/50000)           
 [=========================== 100/100 ============================>]  Step: 1ms | Tot: 425ms | Loss: 0.667 | Acc: 81.660% (8166/10000)              

Epoch: 100
 [=========================== 391/391 ============================>]  Step: 5ms | Tot: 3s466ms | Loss: 0.247 | Acc: 91.122% (45561/50000)           
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 578ms | Loss: 0.625 | Acc: 82.420% (8242/10000) 

The alternative implementation:

Epoch: 0
 [=========================== 391/391 ============================>]  Step: 280ms | Tot: 6s17ms | Loss: 1.487 | Acc: 45.182% (22591/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 447ms | Loss: 1.161 | Acc: 58.970% (5897/10000)              
best acc 58.97

Epoch: 1
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s721ms | Loss: 0.964 | Acc: 65.498% (32749/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 441ms | Loss: 0.841 | Acc: 70.520% (7052/10000)              
best acc 70.52

Epoch: 2
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s737ms | Loss: 0.723 | Acc: 74.588% (37294/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 612ms | Loss: 0.770 | Acc: 73.230% (7323/10000)              
best acc 73.23

Epoch: 3
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s730ms | Loss: 0.561 | Acc: 80.496% (40248/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 551ms | Loss: 0.561 | Acc: 80.520% (8052/10000)              
best acc 80.52

Epoch: 4
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s752ms | Loss: 0.453 | Acc: 84.222% (42111/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 442ms | Loss: 0.475 | Acc: 83.940% (8394/10000)              
best acc 83.94

Epoch: 5
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s740ms | Loss: 0.408 | Acc: 85.824% (42912/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 441ms | Loss: 0.448 | Acc: 85.020% (8502/10000)              
best acc 85.02

Epoch: 6
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s729ms | Loss: 0.416 | Acc: 85.496% (42748/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 644ms | Loss: 0.441 | Acc: 85.000% (8500/10000)              

Epoch: 7
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s748ms | Loss: 0.467 | Acc: 83.688% (41844/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 452ms | Loss: 0.485 | Acc: 84.040% (8404/10000)              

Epoch: 8
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s739ms | Loss: 0.496 | Acc: 82.710% (41355/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 442ms | Loss: 0.623 | Acc: 79.790% (7979/10000)              

Epoch: 9
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s724ms | Loss: 0.483 | Acc: 83.262% (41631/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 549ms | Loss: 0.684 | Acc: 78.320% (7832/10000)              

Epoch: 10
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s742ms | Loss: 0.440 | Acc: 84.908% (42454/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 444ms | Loss: 0.631 | Acc: 79.570% (7957/10000)              

Epoch: 11
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s721ms | Loss: 0.389 | Acc: 86.556% (43278/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 531ms | Loss: 0.486 | Acc: 84.050% (8405/10000)              

Epoch: 12
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s742ms | Loss: 0.319 | Acc: 88.994% (44497/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 442ms | Loss: 0.475 | Acc: 84.510% (8451/10000)              

Epoch: 13
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s724ms | Loss: 0.246 | Acc: 91.536% (45768/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 440ms | Loss: 0.357 | Acc: 88.450% (8845/10000)              
best acc 88.45

Epoch: 14
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s708ms | Loss: 0.187 | Acc: 93.638% (46819/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 446ms | Loss: 0.288 | Acc: 90.500% (9050/10000)              
best acc 90.5

Epoch: 15
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s707ms | Loss: 0.162 | Acc: 94.482% (47241/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 440ms | Loss: 0.282 | Acc: 90.590% (9059/10000)              
best acc 90.59

Epoch: 16
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s732ms | Loss: 0.170 | Acc: 94.254% (47127/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 443ms | Loss: 0.300 | Acc: 90.690% (9069/10000)              
best acc 90.69

Epoch: 17
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s716ms | Loss: 0.214 | Acc: 92.470% (46235/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 467ms | Loss: 0.341 | Acc: 89.210% (8921/10000)              

Epoch: 18
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s713ms | Loss: 0.270 | Acc: 90.622% (45311/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 452ms | Loss: 0.393 | Acc: 87.300% (8730/10000)              

Epoch: 19
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s725ms | Loss: 0.298 | Acc: 89.692% (44846/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 442ms | Loss: 0.427 | Acc: 86.880% (8688/10000)              

Epoch: 20
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s739ms | Loss: 0.289 | Acc: 89.954% (44977/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 450ms | Loss: 0.425 | Acc: 86.210% (8621/10000)              

Epoch: 21
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s720ms | Loss: 0.245 | Acc: 91.602% (45801/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 442ms | Loss: 0.435 | Acc: 86.230% (8623/10000)              

Epoch: 22
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s733ms | Loss: 0.203 | Acc: 92.838% (46419/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 573ms | Loss: 0.315 | Acc: 89.920% (8992/10000)              

Epoch: 23
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s734ms | Loss: 0.137 | Acc: 95.218% (47609/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 445ms | Loss: 0.293 | Acc: 91.080% (9108/10000)              
best acc 91.08

Epoch: 24
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s725ms | Loss: 0.096 | Acc: 96.768% (48384/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 443ms | Loss: 0.265 | Acc: 91.990% (9199/10000)              
best acc 91.99

Epoch: 25
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s725ms | Loss: 0.081 | Acc: 97.326% (48663/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 600ms | Loss: 0.259 | Acc: 92.050% (9205/10000)              
best acc 92.05

Epoch: 26
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s729ms | Loss: 0.085 | Acc: 97.136% (48568/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 440ms | Loss: 0.269 | Acc: 91.820% (9182/10000)              

Epoch: 27
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s717ms | Loss: 0.119 | Acc: 95.888% (47944/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 442ms | Loss: 0.341 | Acc: 90.160% (9016/10000)              

Epoch: 28
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s715ms | Loss: 0.180 | Acc: 93.642% (46821/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 446ms | Loss: 0.350 | Acc: 89.250% (8925/10000)              

Epoch: 29
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s730ms | Loss: 0.207 | Acc: 92.700% (46350/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 446ms | Loss: 0.380 | Acc: 87.990% (8799/10000)              

Epoch: 30
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s705ms | Loss: 0.208 | Acc: 92.596% (46298/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 448ms | Loss: 0.397 | Acc: 88.100% (8810/10000)              

Epoch: 31
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s692ms | Loss: 0.176 | Acc: 93.742% (46871/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 441ms | Loss: 0.399 | Acc: 87.710% (8771/10000)              

Epoch: 32
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s687ms | Loss: 0.130 | Acc: 95.382% (47691/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 553ms | Loss: 0.343 | Acc: 89.560% (8956/10000)              

Epoch: 33
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s697ms | Loss: 0.080 | Acc: 97.238% (48619/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 456ms | Loss: 0.274 | Acc: 92.200% (9220/10000)              
best acc 92.2

Epoch: 34
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s727ms | Loss: 0.051 | Acc: 98.340% (49170/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 448ms | Loss: 0.260 | Acc: 92.420% (9242/10000)              
best acc 92.42

Epoch: 35
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s714ms | Loss: 0.040 | Acc: 98.736% (49368/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 442ms | Loss: 0.256 | Acc: 92.670% (9267/10000)              
best acc 92.67

Epoch: 36
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s699ms | Loss: 0.041 | Acc: 98.650% (49325/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 462ms | Loss: 0.264 | Acc: 92.650% (9265/10000)              

Epoch: 37
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s718ms | Loss: 0.075 | Acc: 97.272% (48636/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 448ms | Loss: 0.326 | Acc: 91.070% (9107/10000)              

Epoch: 38
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s688ms | Loss: 0.132 | Acc: 95.368% (47684/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 559ms | Loss: 0.337 | Acc: 90.400% (9040/10000)              

Epoch: 39
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s699ms | Loss: 0.155 | Acc: 94.564% (47282/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 448ms | Loss: 0.338 | Acc: 89.810% (8981/10000)              

Epoch: 40
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s707ms | Loss: 0.154 | Acc: 94.574% (47287/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 446ms | Loss: 0.398 | Acc: 88.570% (8857/10000)              

Epoch: 41
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s739ms | Loss: 0.132 | Acc: 95.290% (47645/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 461ms | Loss: 0.323 | Acc: 90.550% (9055/10000)              

Epoch: 42
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s899ms | Loss: 0.092 | Acc: 96.730% (48365/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 435ms | Loss: 0.316 | Acc: 91.430% (9143/10000)              

Epoch: 43
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s903ms | Loss: 0.053 | Acc: 98.204% (49102/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 447ms | Loss: 0.275 | Acc: 92.850% (9285/10000)              
best acc 92.85

Epoch: 44
 [=========================== 391/391 ============================>]  Step: 11ms | Tot: 5s836ms | Loss: 0.030 | Acc: 99.018% (49509/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 479ms | Loss: 0.265 | Acc: 93.110% (9311/10000)              
best acc 93.11

Epoch: 45
 [=========================== 391/391 ============================>]  Step: 11ms | Tot: 5s822ms | Loss: 0.021 | Acc: 99.396% (49698/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 520ms | Loss: 0.262 | Acc: 93.170% (9317/10000)              
best acc 93.17

Epoch: 46
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s792ms | Loss: 0.023 | Acc: 99.290% (49645/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 458ms | Loss: 0.279 | Acc: 93.000% (9300/10000)              

Epoch: 47
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s859ms | Loss: 0.046 | Acc: 98.374% (49187/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 440ms | Loss: 0.329 | Acc: 91.770% (9177/10000)              

Epoch: 48
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s831ms | Loss: 0.105 | Acc: 96.400% (48200/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 473ms | Loss: 0.335 | Acc: 91.000% (9100/10000)              

Epoch: 49
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s756ms | Loss: 0.132 | Acc: 95.312% (47656/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 444ms | Loss: 0.352 | Acc: 89.990% (8999/10000)              

Epoch: 50
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s709ms | Loss: 0.130 | Acc: 95.358% (47679/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 440ms | Loss: 0.333 | Acc: 90.590% (9059/10000)              

Epoch: 51
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s705ms | Loss: 0.103 | Acc: 96.228% (48114/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 628ms | Loss: 0.371 | Acc: 89.410% (8941/10000)              

Epoch: 52
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s732ms | Loss: 0.070 | Acc: 97.552% (48776/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 537ms | Loss: 0.367 | Acc: 90.640% (9064/10000)              

Epoch: 53
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s717ms | Loss: 0.036 | Acc: 98.780% (49390/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 444ms | Loss: 0.293 | Acc: 92.770% (9277/10000)              

Epoch: 54
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s703ms | Loss: 0.021 | Acc: 99.392% (49696/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 443ms | Loss: 0.267 | Acc: 93.150% (9315/10000)              

Epoch: 55
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s692ms | Loss: 0.015 | Acc: 99.572% (49786/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 441ms | Loss: 0.263 | Acc: 93.330% (9333/10000)              
best acc 93.33

Epoch: 56
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s724ms | Loss: 0.014 | Acc: 99.532% (49766/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 443ms | Loss: 0.271 | Acc: 93.210% (9321/10000)              

Epoch: 57
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s709ms | Loss: 0.032 | Acc: 98.848% (49424/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 455ms | Loss: 0.324 | Acc: 92.020% (9202/10000)              

Epoch: 58
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s708ms | Loss: 0.087 | Acc: 96.950% (48475/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 442ms | Loss: 0.339 | Acc: 91.290% (9129/10000)              

Epoch: 59
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s702ms | Loss: 0.107 | Acc: 96.244% (48122/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 445ms | Loss: 0.358 | Acc: 90.540% (9054/10000)              

Epoch: 60
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s715ms | Loss: 0.109 | Acc: 96.202% (48101/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 446ms | Loss: 0.407 | Acc: 89.130% (8913/10000)              

Epoch: 61
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s705ms | Loss: 0.087 | Acc: 96.890% (48445/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 440ms | Loss: 0.430 | Acc: 89.010% (8901/10000)              

Epoch: 62
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s705ms | Loss: 0.051 | Acc: 98.200% (49100/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 438ms | Loss: 0.378 | Acc: 90.910% (9091/10000)              

Epoch: 63
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s697ms | Loss: 0.027 | Acc: 99.058% (49529/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 443ms | Loss: 0.302 | Acc: 92.520% (9252/10000)              

Epoch: 64
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s695ms | Loss: 0.014 | Acc: 99.566% (49783/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 444ms | Loss: 0.281 | Acc: 93.030% (9303/10000)              

Epoch: 65
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s694ms | Loss: 0.011 | Acc: 99.706% (49853/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 442ms | Loss: 0.279 | Acc: 93.140% (9314/10000)              

Epoch: 66
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s718ms | Loss: 0.010 | Acc: 99.700% (49850/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 437ms | Loss: 0.291 | Acc: 93.050% (9305/10000)              

Epoch: 67
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s684ms | Loss: 0.023 | Acc: 99.254% (49627/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 446ms | Loss: 0.332 | Acc: 92.420% (9242/10000)              

Epoch: 68
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s692ms | Loss: 0.079 | Acc: 97.186% (48593/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 444ms | Loss: 0.376 | Acc: 90.850% (9085/10000)              

Epoch: 69
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s697ms | Loss: 0.093 | Acc: 96.720% (48360/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 595ms | Loss: 0.397 | Acc: 90.110% (9011/10000)              

Epoch: 70
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s741ms | Loss: 0.093 | Acc: 96.778% (48389/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 530ms | Loss: 0.380 | Acc: 90.120% (9012/10000)              

Epoch: 71
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s706ms | Loss: 0.073 | Acc: 97.452% (48726/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 438ms | Loss: 0.387 | Acc: 90.650% (9065/10000)              

Epoch: 72
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s691ms | Loss: 0.044 | Acc: 98.486% (49243/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 440ms | Loss: 0.336 | Acc: 91.820% (9182/10000)              

Epoch: 73
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s716ms | Loss: 0.022 | Acc: 99.278% (49639/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 440ms | Loss: 0.297 | Acc: 92.770% (9277/10000)              

Epoch: 74
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s693ms | Loss: 0.012 | Acc: 99.628% (49814/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 443ms | Loss: 0.282 | Acc: 93.400% (9340/10000)              
best acc 93.4

Epoch: 75
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s705ms | Loss: 0.008 | Acc: 99.778% (49889/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 437ms | Loss: 0.281 | Acc: 93.380% (9338/10000)              

Epoch: 76
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s688ms | Loss: 0.009 | Acc: 99.754% (49877/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 441ms | Loss: 0.280 | Acc: 93.420% (9342/10000)              
best acc 93.42

Epoch: 77
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s840ms | Loss: 0.019 | Acc: 99.332% (49666/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 445ms | Loss: 0.355 | Acc: 92.470% (9247/10000)              

Epoch: 78
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s831ms | Loss: 0.071 | Acc: 97.478% (48739/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 447ms | Loss: 0.365 | Acc: 90.990% (9099/10000)              

Epoch: 79
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s726ms | Loss: 0.087 | Acc: 96.850% (48425/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 468ms | Loss: 0.364 | Acc: 90.640% (9064/10000)              

Epoch: 80
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s732ms | Loss: 0.087 | Acc: 96.950% (48475/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 439ms | Loss: 0.353 | Acc: 90.490% (9049/10000)              

Epoch: 81
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s653ms | Loss: 0.062 | Acc: 97.896% (48948/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 434ms | Loss: 0.374 | Acc: 90.820% (9082/10000)              

Epoch: 82
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s674ms | Loss: 0.039 | Acc: 98.688% (49344/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 445ms | Loss: 0.326 | Acc: 92.190% (9219/10000)              

Epoch: 83
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s679ms | Loss: 0.019 | Acc: 99.414% (49707/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 457ms | Loss: 0.294 | Acc: 92.940% (9294/10000)              

Epoch: 84
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s664ms | Loss: 0.010 | Acc: 99.744% (49872/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 447ms | Loss: 0.286 | Acc: 93.480% (9348/10000)              
best acc 93.48

Epoch: 85
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s669ms | Loss: 0.008 | Acc: 99.780% (49890/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 436ms | Loss: 0.286 | Acc: 93.400% (9340/10000)              

Epoch: 86
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s698ms | Loss: 0.007 | Acc: 99.814% (49907/50000)          
 [=========================== 100/100 ============================>]  Step: 5ms | Tot: 451ms | Loss: 0.293 | Acc: 93.350% (9335/10000)              

Epoch: 87
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s670ms | Loss: 0.018 | Acc: 99.432% (49716/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 442ms | Loss: 0.314 | Acc: 92.920% (9292/10000)              

Epoch: 88
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s698ms | Loss: 0.060 | Acc: 97.918% (48959/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 440ms | Loss: 0.350 | Acc: 91.780% (9178/10000)              

Epoch: 89
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s663ms | Loss: 0.080 | Acc: 97.186% (48593/50000)          
 [=========================== 100/100 ============================>]  Step: 4ms | Tot: 439ms | Loss: 0.364 | Acc: 91.110% (9111/10000)              

Epoch: 90
 [=========================== 391/391 ============================>]  Step: 10ms | Tot: 5s664ms | Loss: 0.080 | Acc: 97.174% (48587/50000)          
 [=========================== 100/100 ============================>]  Step: 3ms | Tot: 434ms | Loss: 0.337 | Acc: 91.320% (9132/10000) 

@yuxiangwei0808 Your 'alternative' resnet is more friendly for lower resolution dataset like cifar, it has fewer stride 2 layers, no stem maxpool, the timm one is not, it's the classic imagenet oriented resnet, their weight inits are also different. Nothing unexpected here.

Ok, thanks for the response!