wuji3/visiondk

deep-learning image-classification image-recognition face-recognition machine pytorch

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

Install ☘️

# It is recommanded to create a separate virtual environment
conda create -n vision python=3.10 
conda activate vision

# torch==2.0.1(lower is also ok) -> https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cpuonly -c pytorch # cpu-version
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia  # cuda-version

pip install -r requirements.txt

# Without Arial.ttf, inference may be slow due to network IO.
mkdir -p ~/.config/DuKe
cp misc/Arial.ttf ~/.config/DuKe

Training 🌟️

# one machine one gpu
python main.py --cfgs configs/task/pet.yaml

# one machine multiple gpus
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 main.py --cfgs configs/classification/pet.yaml
                                                                 --sync_bn[Option: this will lead to training slowly]
                                                                 --resume[Option: training from checkpoint]
                                                                 --load_from[Option: training from fine-tuning]

What's New

[Apr. 2024] Face Recognition Task(FRT) is supported now 🚀️️! We provide ResNet, EfficientNet, and Swin Transformer as backbone; As for head, ArcFace, CircleLoss, MegFace and MV Softmax could be used for training. Note: partial implementation refers to JD-FaceX
[Jun. 2023] Image Classification Task(ICT) has launched 🚀️️! Supporting many powerful strategies, such as progressive learning, online enhancement, beautiful training interface, exponential moving average, etc. The models are fully integrated into torchvision.
[May. 2023] The first initialization version of Vision.

Which's task

Implemented Method & Paper

Method	Paper
SAM	Sharpness-Aware Minimization for Efficiently Improving Generalization
Progressive Learning	EfficientNetV2: Smaller Models and Faster Training
OHEM	Training Region-based Object Detectors with Online Hard Example Mining
Focal Loss	Focal Loss for Dense Object Detection
Cosine Annealing	SGDR: Stochastic Gradient Descent with Warm Restarts
Label Smoothing	Rethinking the Inception Architecture for Computer Vision
Mixup	MixUp: Beyond Empirical Risk Minimization
CutOut	Improved Regularization of Convolutional Neural Networks with Cutout
Attention Pool	Augmenting Convolutional networks with attention-based aggregation
GradCAM	Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
ArcFace	ArcFace: Additive Angular Margin Loss for Deep Face Recognition
CircleLoss	Circle Loss: A Unified Perspective of Pair Similarity Optimization
MegFace	MagFace: A Universal Representation for Face Recognition and Quality Assessment
MV Softmax	Mis-classified Vector Guided Softmax Loss for Face Recognition

Model & Paper

Method	Paper	Name in configs, eg: torchvision-mobilenet_v2
MobileNetv2	MobileNetV2: Inverted Residuals and Linear Bottlenecks	mobilenet_v2
MobileNetv3	Searching for MobileNetV3	mobilenet_v3_small, mobilenet_v3_large
ShuffleNetv2	ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design	shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
ResNet	Deep Residual Learning for Image Recognition	resnet18, resnet34, resnet50, resnet101, resnet152
ResNeXt	Aggregated Residual Transformations for Deep Neural Networks	resnext50_32x4d, resnext101_32x8d, resnext101_64x4d
ConvNext	A ConvNet for the 2020s	convnext_tiny, convnext_small, convnext_base, convnext_large
EfficientNet	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	efficientnet_b{0..7}
EfficientNetv2	EfficientNetV2: Smaller Models and Faster Training	efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l
Swin Transformer	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	swin_t, swin_s, swin_b
Swin Transformerv2	Swin Transformer V2: Scaling Up Capacity and Resolution	swin_v2_t, swin_v2_s, swin_v2_b

Tools

Split the data set into training set and validation set

python tools/data_prepare.py --postfix <jpg or png> --root <input your data realpath> --frac <train segment ratio, eg: 0.9 0.6 0.3 0.9 0.9>

Data augmented visualization

cd visiondk
python -m tools.test_augment

Contact Me

If you enjoy reproducing papers and algorithms, welcome to pull request.
If you have some confusion about the repo, please submit issues.

About

A powerful baseline for image classification and face recognition with Pytorch

deep-learning image-classification image-recognition face-recognition machine pytorch

GNU General Public License v3.0

Languages

Language:Python 100.0%