HaoMood / fine-grained-baseline

Baseline results of fine-grained visual recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

              Baseline Results for Fine-grained Visual Recognition


DESCRIPTIONS
    PyTorch code for baseline results for fine-grained visual recognition.

    Fine-grained recognition tasks aim at identifying the specific species of
    a bird, or the model of an aircraft. It is a challenging task because the
    visual differences between the categories are subtle and can be easily
    overwhelmed by the change of pose, viewpoint, or background.

            Commonly used fine-grained visual recognition datasets.
              ---------------------------------------------------
                Dataset    |   # classes |   # train  |  # test
              ---------------------------------------------------
                CUB        |      200    |    5,994   |   5,794
                Cars       |      196    |    8,144   |   8,041
                Aircraft   |      100    |    6,667   |   3,333
              ---------------------------------------------------

    ResNet-50 and VGG-16 are two mostly widely used backbone networks for
    fine-grained visual recognition. These two networks are pre-trained on
    ImageNet.

                    Accuracy of ImageNet pre-trained models.
                         -----------------------------
                             Model    |  ImageNet (%)
                         -----------------------------
                            VGG-16    |    71.59
                           ResNet-18  |    69.76
                           ResNet-50  |    76.15
                         -----------------------------

    For fine-grained fine-tuning, only the category label is used. The input
    images are resized resized to a fixed size of 512*512 and randomly cropped
    into 448*448. Random horizontal flip are applied for data augmentation.
    For VGG-16, two-step fine-tuning (fine-tune FC layer and then fine-tune all
    layers) is essential for several bilinear pooling based methods. However,
    two-step fine-tuning does not much help baseline results very much.

                       Test accuarcy of baseline results.
                    ----------------------------------------
                      Input size  |    Model    |  CUB (%)
                    ----------------------------------------
                         224      |   VGG-16    |  74.47
                         448      |   VGG-16    |  78.79
                    ----------------------------------------
                         224      |  ResNet-18  |  76.89
                         224      |  ResNet-50  |  80.64
                         448      |  ResNet-18  |  82.26
                         448      |  ResNet-50  |  85.99
                    ----------------------------------------


PREREQUIREMENTS
    Python 3.7, where Anaconda is preferred.
    https://www.anaconda.com/distribution/#download-section

    PyTorch 1.0.0
    https://pytorch.org/

    Visdom (optional)
    https://github.com/facebookresearch/visdom/


USAGE
    Prepraration
    $ mkdir model; cd src/
    Modify line 282 of train.py to be your custom data path.

    ResNet model fine-tuning
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --arch resnet50 \
        --input_size 448

    VGG model fine-tuning
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --arch vgg16 \
        --input_size 448


AUTHOR
    Hao Zhang (zhangh0214@gmail.com)


LICENSE
    CC BY-SA 4.0

About

Baseline results of fine-grained visual recognition


Languages

Language:Python 100.0%