nam157/MiVOLO

MiVOLO: Multi-input Transformer for Age and Gender Estimation

MiVOLO: Multi-input Transformer for Age and Gender Estimation, Maksim Kuprashevich, Irina Tolstykh, 2023 arXiv 2307.04616

[Paper] [Demo] [BibTex] [Data]

MiVOLO pretrained models

Gender & Age recognition performance.

Model	Type	Dataset	Age MAE	Age CS@5	Gender Accuracy	download
volo_d1	face_only, age	IMDB-cleaned	4.29	67.71	-	checkpoint
volo_d1	face_only, age, gender	IMDB-cleaned	4.22	68.68	99.38	checkpoint
mivolo_d1	face_body, age, gender	IMDB-cleaned	4.24 [face+body] 6.87 [body]	68.32 [face+body] 46.32 [body]	99.46 [face+body] 96.48 [body]	checkpoint
volo_d1	face_only, age	UTKFace	4.23	69.72	-	checkpoint
volo_d1	face_only, age, gender	UTKFace	4.23	69.78	97.69	checkpoint
mivolo_d1	face_body, age, gender	Lagenda	3.99 [face+body]	71.27 [face+body]	97.36 [face+body]	demo

Dataset

Please, cite our paper if you use any this data!

Lagenda dataset: images and annotation.
IMDB-clean: follow these instructions to get images and download our annotations.
UTK dataset: origin full images and our annotation: split from the article, random full split.
Adience dataset: follow these instructions to get images and download our annotations.
Click to expand!

After downloading them, your data directory should look something like this:
```
data
└── Adience
    ├── annotations  (folder with our annotations)
    ├── aligned      (will not be used)
    ├── faces
    ├── fold_0_data.txt
    ├── fold_1_data.txt
    ├── fold_2_data.txt
    ├── fold_3_data.txt
    └── fold_4_data.txt
```
We use coarse aligned images from faces/ dir.

Using our detector we found a face bbox for each image (see tools/prepare_adience.py).

This dataset has five folds. The performance metric is accuracy on five-fold cross validation.

images before removal fold 0 fold 1 fold 2 fold 3 fold 4

19,370 4,484 3,730 3,894 3,446 3,816

Not complete data

only age not found only gender not found SUM

40 1170 1,210 (6.2 %)

Removed data

failed to process image age and gender not found SUM

0 708 708 (3.6 %)

Genders

female male

9,372 8,120

Ages (8 classes) after mapping to not intersected ages intervals

0-2 4-6 8-12 15-20 25-32 38-43 48-53 60-100

2,509 2,140 2,293 1,791 5,589 2,490 909 901
FairFace dataset: follow these instructions to get images and download our annotations.
Click to expand!

After downloading them, your data directory should look something like this:
```
data
└── FairFace
   ├── annotations  (folder with our annotations)
   ├── fairface-img-margin025-trainval   (will not be used)
       ├── train
       ├── val
   ├── fairface-img-margin125-trainval
       ├── train
       ├── val
   ├── fairface_label_train.csv
   ├── fairface_label_val.csv
```
We use aligned images from fairface-img-margin125-trainval/ dir.

Using our detector we found a face bbox for each image and added a person bbox if it was possible (see tools/prepare_fairface.py).

This dataset has 2 splits: train and val. The performance metric is accuracy on validation.

images train images val

86,744 10,954

Genders for validation

female male

5,162 5,792

Ages for validation (9 classes):

0-2 3-9 10-19 20-29 30-39 40-49 50-59 60-69 70+

199 1,356 1,181 3,300 2,330 1,353 796 321 118

images before removal	fold 0	fold 1	fold 2	fold 3	fold 4
19,370	4,484	3,730	3,894	3,446	3,816

only age not found	only gender not found	SUM
40	1170	1,210 (6.2 %)

failed to process image	age and gender not found	SUM
0	708	708 (3.6 %)

female	male
9,372	8,120

0-2	4-6	8-12	15-20	25-32	38-43	48-53	60-100
2,509	2,140	2,293	1,791	5,589	2,490	909	901

images train	images val
86,744	10,954

female	male
5,162	5,792

0-2	3-9	10-19	20-29	30-39	40-49	50-59	60-69	70+
199	1,356	1,181	3,300	2,330	1,353	796	321	118

Install

Install pytorch 1.13+ and other requirements.

pip install -r requirements.txt
pip install .

Demo

Download body + face detector model to models/yolov8x_person_face.pt
Download mivolo checkpoint to models/mivolo_imbd.pth.tar

wget https://variety.com/wp-content/uploads/2023/04/MCDNOHA_SP001.jpg -O jennifer_lawrence.jpg

python3 demo.py \
--input "jennifer_lawrence.jpg" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt " \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--with-persons \
--draw

To run demo for a youtube video:

python3 demo.py \
--input "https://www.youtube.com/shorts/pVh32k0hGEI" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt" \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--draw \
--with-persons

Validation

To reproduce validation metrics:

Download prepared annotations for imbd-clean / utk / adience / lagenda / fairface.
Download checkpoint
Run validation:

python3 eval_pretrained.py \
  --dataset_images /path/to/dataset/utk/images \
  --dataset_annotations /path/to/dataset/utk/annotation \
  --dataset_name utk \
  --split valid \
  --batch-size 512 \
  --checkpoint models/mivolo_imbd.pth.tar \
  --half \
  --with-persons \
  --device "cuda:0"

Supported dataset names: "utk", "imdb", "lagenda", "fairface", "adience".

License

Please, see here

Citing

If you use our models, code or dataset, we kindly request you to cite the following paper and give repository a ⭐

@article{mivolo2023,
   Author = {Maksim Kuprashevich and Irina Tolstykh},
   Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation},
   Year = {2023},
   Eprint = {arXiv:2307.04616},
}

nam157 / MiVOLO