BlindOver/blindover_AI

ai classification deep-learning mobile-app pytorch quantization efficientnet mobilenetv3 resnet shufflenetv2 ptq qat

Build Deep Learning Model to classify beverages for blind individuals

Models: ShuffleNetV2, MobileNetV3, EfficientNetV2, ResNet
Number of Parameters (based on 33 classes):

ShuffleNetV2 (x0.5) MobileNetV3 (small) EfficientNetV2 ResNet18 ResNet50

375,617 1,551,681 20,219,761 11,193,441 23,575,649

ShuffleNetV2 (x0.5)	MobileNetV3 (small)	EfficientNetV2	ResNet18	ResNet50
375,617	1,551,681	20,219,761	11,193,441	23,575,649

Training and Test

Install virtual environment in Anaconda

conda create -n blindover python==3.8
conda activate blindover
cd ./blindover_AI
pip install -r requirements.txt

Plattform

RTX 3090 GPU
CUDA 11.7
CUDNN 8.5
PyTorch 1.13

Training

python3 train.py --data_path 'the/directory/of/dataset' --name exp --model {the one of 5 models} --pretrained --img_size 224 --num_workers 8 --batch_size 4 --epochs 100 --optimizer adam --lr_scheduling --check_point

Test

python3 evaluate.py --data_path 'the/directory/of/dataset' --model resnet18 --weight 'the/path/of/trained/weight/file' --img_size 224 --num_workers 8 --batch_size 32 --num_classes 33

Inference

python3 inference.py --src 'the/directory/of/image' --model_name resnet18 --weight 'the/path/of/trained/weight/file' --quantization --measure_latency

Overview

The Pipeline of our Process with simple CNN architecture

Features

To avoid image distortion, we applied padding and resize processing. (code)

from utils.dataset import Padding
from PIL import Image

img = Image.open('./image.png')
padded_img = Padding()(img)

Correct and Incorrect examples

To maximize the performance of model on mobile device or virtual server, we trained various models such as EfficientNetV2, MobileNetV3, ShuffleNetV2 and ResNet, and compared the accuracy and inference speed between these models. (code) The experimental results for this are pressented in Results
To accelerate inference speed, we performed quantization (QAT and PTQ) and compared its performance of accuracy and inference speed with base model. Also, we provied the experimental results for quantization. (README)
- Comparison between QAT and PTQ (source of figure)
```
# Convert file from float32 to uint8 with PTQ mode

python3 ./convert_ptq_mode.py --data_path 'the/path/of/dataset' --model_name 'model name' --weight 'path/of/trained/weight/file'
```
To address the issue of insufficient data, we utilized image generation models such as Diffusion and DALL-E to increase the number of samples. Also, we applied random image transformation such as colorization, sharpness, contrast and brightness to make slight changes to the image instead of original image. (code)
```
python3 ./composite.py --foreground_path 'the/path/of/foregorund/images' --background_path 'the/path/of/background/images' --save_dir 'a/folder/to/save/generated/images'
```
- Examples of composite image

Dataset

We collected 10 to 15 images per class and then augmented the training data through image composite.

Sample Images

Coca Cola

Sprite Zero

Classes

2% (0)	박카스 (1)	칠성 사이다 (2)	칠성 사이다 제로 (3)	초코 우유 (4)	코카 콜라 (5)
데미소다 사과 (6)	데미소다 복숭아 (7)	솔의눈 (8)	환타 오렌지 (9)	게토레이 (10)	제티 (11)
맥콜 (12)	우유 (13)	밀키스 (14)	밀키스 제로 (15)	마운틴듀 (16)	펩시 (17)
펩시 제로 (18)	포카리 스웨트 (19)	파워에이드 (20)	레드불 (21)	식혜 (22)	스프라이트 (23)
스프라이트 제로 (24)	딸기 우유 (25)	비타 500 (26)	브이톡 블루레몬 (27)	브이톡 복숭아 (28)	웰치스 포도 (29)
웰치스 오렌지 (30)	웰치스 화이트그레이프 (31)	제로 콜라 (32)	-	-	-

Dataset Structure

path : dataset/
├── images
│    ├─ class 1
│        ├─ img1.jpg
│        ├─ ...
│    ├─ class 2
│        ├─ img1.jpg
│        ├─ ...
│    ├─ class 3
│        ├─ img1.jpg
│        ├─ ...
│    ├─ ...
│        ├─ ...
│        ├─ ...

Acknowledgements

Assisted in Dataset Collection (데이터셋 수집에 도움 주신 분들): 이마트24 용인 명지대점, 하나로마트 오산농협본점

About

Build AI model to classify beverages for blind individuals

ai classification deep-learning mobile-app pytorch quantization efficientnet mobilenetv3 resnet shufflenetv2 ptq qat

MIT License

Languages

Language:Python 100.0%