Gesture recognizer

📁 Dataset

Hand Gesture of the Colombian sign language

Dataset was taken from kaggle (link here) and contains images of hand gestures for numbers (0-5) and vowels (A, E, I, O, U). Based on EDA I chose a subset of gestures to recognize (1, 2, 3, 4, 5, A, O, U).

❗ Example images can be found on the bottom of EDA.

🔨 Input preprocessing

Images were resized to 512x256. During training they were vertically flipped, have randomly added noise and increased brightness and contrast.

⚙️ Model

More information about used models and why I chose them can be found here

🌎 Conda environment

Create conda environment from yml file:
conda env create -f environment/environment.yml
then activate using command:
conda activate gesture_recognizer

👀 Tasks

You can run app for 3 tasks:

find_mean_std - find mean and std values which should be used to normalize data; run: python main.py find_mean_std
find_lr - find optimal learning rate for model (should be used before training to set it up correctly); run: python main.py find_lr
train - conduct training of model; run: python main.py train

❓ If you need any help regarding additional arguments available to provide by user run python main.py --help

💥 Results

Overall results for find_lr and train tasks

WITHOUT NORMALIZATION

LR FINDER

Model name	Pretrained	Iterations	Min lr set in finder	Max lr set in finder	Found lr
MnasNet	Yes	200	1e-7	1	4e-1
MnasNet	No	200	1e-7	1	2.5e-1
SqueezeNet	Yes	200	1e-7	1	3.3e-1
MobileNetV2	Yes	200	1e-7	1	5e-1

TRAINING

Model name	Pretrained	Epochs trained	Max validation accuracy	Min lr set in cyclic lr scheduler	Max lr set in cyclic lr scheduler
MnasNet	Yes	50	61.89%	4e-4	4e-2
MnasNet	No	30	13.73%	2.5e-4	2.5e-2
SqueezeNet	Yes	100	70.08%	3.5e-4	3.5e-3
MobileNetV2	Yes	80	65.36%	5e-4	5e-3

MnasNet: As a max learning rate for cyclic lr scheduler I set lr found by lr finder divided by 10, as a min lr - found lr divided by 1000. In SqueezeNet I didn't take 3.3e-1 (found lr) but 3.5e-1 because it gives me better results. As a min I have divided it by 1000, as a max - by 100. Same for MobileNetV2 was applied.

WITH NORMALIZATION

Hand Gesture of the Colombian sign language dataset

Mean and std was obtained using find_mean_std task based on Hand Gesture of the Colombian sign language dataset.
Mean: [0.7315, 0.6840, 0.6410]
Std: [0.4252, 0.4546, 0.4789]

LR FINDER

Model name	Pretrained	Iterations	Min lr set in finder	Max lr set in finder	Found lr
MnasNet	Yes	200	1e-7	1	3.7e-1
SqueezeNet	Yes	200	1e-7	1	5e-1
MobileNetV2	Yes	200	1e-7	1	3.5e-1

TRAINING

Model name	Pretrained	Epochs trained	Max validation accuracy	Min lr set in cyclic lr scheduler	Max lr set in cyclic lr scheduler
MnasNet	Yes	100	80.74%	3.3e-3	3.5e-2
SqueezeNet	Yes	100	79.09%	5e-3	5e-2
MobileNetV2	Yes	100	70.29%	3.5e-4	3.5e-3

ImageNet dataset

Mean and std for ImageNet dataset.
Mean: [0.485, 0.456, 0.406]
Std: [0.229, 0.224, 0.225]

LR FINDER

Model name	Pretrained	Iterations	Min lr set in finder	Max lr set in finder	Found lr
MnasNet	Yes	200	1e-7	1	3.4e-1
SqueezeNet	Yes	200	1e-7	1	1
MobileNetV2	Yes	200	1e-7	1	3.7e-1

TRAINING

Model name	Pretrained	Epochs trained	Max validation accuracy	Min lr set in cyclic lr scheduler	Max lr set in cyclic lr scheduler
MnasNet	Yes	100	83.81%	3.2e-3	3.2e-2
SqueezeNet	Yes	100	81.56%	1e-3	1e-2
MobileNetV2	Yes	100	80.12%	3.5e-4	3.5e-3

Lr finder

Plots obtained using LRFinder are located under results/NORMALIZATION/visualisations_lr_finder/MODEL_NAME/ directory. You can find plots for pretrained and not pretrained models under specific directories. There are 2 files: plot obtained using lr finder with information about lr value and loss in the lowest point and zoomed area of interest.

Example (pretrained MnasNet; without normalization):

Models

Trained models are saved in results/NORMALIZATION/model/MODEL_NAME/ directory. Only models better than previous will be saved.
Under directory containing model name you can find 2 directories: one contains saved models for pretrained model, second for not pretrained. Validation accuracy is set as a filename.

❗ Only model with the highest score is uploaded to github.

Size

I have to take into account the size of the model, because it should be small for mobile phones.

Model name	Pretrained	Size (MB)
MnasNet	Yes	13.9
MnasNet	No	13.9
SqueezeNet	Yes	3
MobileNetV2	Yes	10.4

Tensorboard

Run tensorboard --logdir=results/ and open http://localhost:6006 in your browser to check train/validation accuracy/loss during all epochs

joanna-janos / GestureRecognizer

Gesture recognizer

📁 Dataset

🔨 Input preprocessing

⚙️ Model

🌎 Conda environment

👀 Tasks

💥 Results

WITHOUT NORMALIZATION

WITH NORMALIZATION

About

Languages