wuji3 / visiondk

A powerful baseline for image classification and face recognition with Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

Install ☘️
# It is recommanded to create a separate virtual environment
conda create -n vision python=3.10 
conda activate vision

# torch==2.0.1(lower is also ok) -> https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cpuonly -c pytorch # cpu-version
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia  # cuda-version

pip install -r requirements.txt

# Without Arial.ttf, inference may be slow due to network IO.
mkdir -p ~/.config/DuKe
cp misc/Arial.ttf ~/.config/DuKe
Training 🌟️
# one machine one gpu
python main.py --cfgs configs/task/pet.yaml

# one machine multiple gpus
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 main.py --cfgs configs/classification/pet.yaml
                                                                 --sync_bn[Option: this will lead to training slowly]
                                                                 --resume[Option: training from checkpoint]
                                                                 --load_from[Option: training from fine-tuning]

What's New

  • [Apr. 2024] Face Recognition Task(FRT) is supported now 🚀️️! We provide ResNet, EfficientNet, and Swin Transformer as backbone; As for head, ArcFace, CircleLoss, MegFace and MV Softmax could be used for training. Note: partial implementation refers to JD-FaceX
  • [Jun. 2023] Image Classification Task(ICT) has launched 🚀️️! Supporting many powerful strategies, such as progressive learning, online enhancement, beautiful training interface, exponential moving average, etc. The models are fully integrated into torchvision.
  • [May. 2023] The first initialization version of Vision.

Which's task

  1. Face Recognition Task(FRT)
  2. Image Classification Task(ICT)

Implemented Method & Paper

Method Paper
SAM Sharpness-Aware Minimization for Efficiently Improving Generalization
Progressive Learning EfficientNetV2: Smaller Models and Faster Training
OHEM Training Region-based Object Detectors with Online Hard Example Mining
Focal Loss Focal Loss for Dense Object Detection
Cosine Annealing SGDR: Stochastic Gradient Descent with Warm Restarts
Label Smoothing Rethinking the Inception Architecture for Computer Vision
Mixup MixUp: Beyond Empirical Risk Minimization
CutOut Improved Regularization of Convolutional Neural Networks with Cutout
Attention Pool Augmenting Convolutional networks with attention-based aggregation
GradCAM Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
ArcFace ArcFace: Additive Angular Margin Loss for Deep Face Recognition
CircleLoss Circle Loss: A Unified Perspective of Pair Similarity Optimization
MegFace MagFace: A Universal Representation for Face Recognition and Quality Assessment
MV Softmax Mis-classified Vector Guided Softmax Loss for Face Recognition

Model & Paper

Method Paper Name in configs, eg: torchvision-mobilenet_v2
MobileNetv2 MobileNetV2: Inverted Residuals and Linear Bottlenecks mobilenet_v2
MobileNetv3 Searching for MobileNetV3 mobilenet_v3_small, mobilenet_v3_large
ShuffleNetv2 ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
ResNet Deep Residual Learning for Image Recognition resnet18, resnet34, resnet50, resnet101, resnet152
ResNeXt Aggregated Residual Transformations for Deep Neural Networks resnext50_32x4d, resnext101_32x8d, resnext101_64x4d
ConvNext A ConvNet for the 2020s convnext_tiny, convnext_small, convnext_base, convnext_large
EfficientNet EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks efficientnet_b{0..7}
EfficientNetv2 EfficientNetV2: Smaller Models and Faster Training efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l
Swin Transformer Swin Transformer: Hierarchical Vision Transformer using Shifted Windows swin_t, swin_s, swin_b
Swin Transformerv2 Swin Transformer V2: Scaling Up Capacity and Resolution swin_v2_t, swin_v2_s, swin_v2_b

Tools

  1. Split the data set into training set and validation set
python tools/data_prepare.py --postfix <jpg or png> --root <input your data realpath> --frac <train segment ratio, eg: 0.9 0.6 0.3 0.9 0.9>
  1. Data augmented visualization
cd visiondk
python -m tools.test_augment

Contact Me

  1. If you enjoy reproducing papers and algorithms, welcome to pull request.
  2. If you have some confusion about the repo, please submit issues.

About

A powerful baseline for image classification and face recognition with Pytorch

License:GNU General Public License v3.0


Languages

Language:Python 100.0%