nam157 / MiVOLO

MiVOLO age & gender transformer neural network

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


MiVOLO: Multi-input Transformer for Age and Gender Estimation

PWC PWC PWC PWC PWC PWC PWC

MiVOLO: Multi-input Transformer for Age and Gender Estimation, Maksim Kuprashevich, Irina Tolstykh, 2023 arXiv 2307.04616

[Paper] [Demo] [BibTex] [Data]

MiVOLO pretrained models

Gender & Age recognition performance.

Model Type Dataset Age MAE Age CS@5 Gender Accuracy download
volo_d1 face_only, age IMDB-cleaned 4.29 67.71 - checkpoint
volo_d1 face_only, age, gender IMDB-cleaned 4.22 68.68 99.38 checkpoint
mivolo_d1 face_body, age, gender IMDB-cleaned 4.24 [face+body]
6.87 [body]
68.32 [face+body]
46.32 [body]
99.46 [face+body]
96.48 [body]
checkpoint
volo_d1 face_only, age UTKFace 4.23 69.72 - checkpoint
volo_d1 face_only, age, gender UTKFace 4.23 69.78 97.69 checkpoint
mivolo_d1 face_body, age, gender Lagenda 3.99 [face+body] 71.27 [face+body] 97.36 [face+body] demo

Dataset

Please, cite our paper if you use any this data!

  • Lagenda dataset: images and annotation.

  • IMDB-clean: follow these instructions to get images and download our annotations.

  • UTK dataset: origin full images and our annotation: split from the article, random full split.

  • Adience dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    After downloading them, your data directory should look something like this:

    data
    └── Adience
        ├── annotations  (folder with our annotations)
        ├── aligned      (will not be used)
        ├── faces
        ├── fold_0_data.txt
        ├── fold_1_data.txt
        ├── fold_2_data.txt
        ├── fold_3_data.txt
        └── fold_4_data.txt

    We use coarse aligned images from faces/ dir.

    Using our detector we found a face bbox for each image (see tools/prepare_adience.py).

    This dataset has five folds. The performance metric is accuracy on five-fold cross validation.

    images before removal fold 0 fold 1 fold 2 fold 3 fold 4
    19,370 4,484 3,730 3,894 3,446 3,816

    Not complete data

    only age not found only gender not found SUM
    40 1170 1,210 (6.2 %)

    Removed data

    failed to process image age and gender not found SUM
    0 708 708 (3.6 %)

    Genders

    female male
    9,372 8,120

    Ages (8 classes) after mapping to not intersected ages intervals

    0-2 4-6 8-12 15-20 25-32 38-43 48-53 60-100
    2,509 2,140 2,293 1,791 5,589 2,490 909 901
  • FairFace dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    After downloading them, your data directory should look something like this:

    data
    └── FairFace
       ├── annotations  (folder with our annotations)
       ├── fairface-img-margin025-trainval   (will not be used)
           ├── train
           ├── val
       ├── fairface-img-margin125-trainval
           ├── train
           ├── val
       ├── fairface_label_train.csv
       ├── fairface_label_val.csv
    

    We use aligned images from fairface-img-margin125-trainval/ dir.

    Using our detector we found a face bbox for each image and added a person bbox if it was possible (see tools/prepare_fairface.py).

    This dataset has 2 splits: train and val. The performance metric is accuracy on validation.

    images train images val
    86,744 10,954

    Genders for validation

    female male
    5,162 5,792

    Ages for validation (9 classes):

    0-2 3-9 10-19 20-29 30-39 40-49 50-59 60-69 70+
    199 1,356 1,181 3,300 2,330 1,353 796 321 118

Install

Install pytorch 1.13+ and other requirements.

pip install -r requirements.txt
pip install .

Demo

  1. Download body + face detector model to models/yolov8x_person_face.pt
  2. Download mivolo checkpoint to models/mivolo_imbd.pth.tar
wget https://variety.com/wp-content/uploads/2023/04/MCDNOHA_SP001.jpg -O jennifer_lawrence.jpg

python3 demo.py \
--input "jennifer_lawrence.jpg" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt " \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--with-persons \
--draw

To run demo for a youtube video:

python3 demo.py \
--input "https://www.youtube.com/shorts/pVh32k0hGEI" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt" \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--draw \
--with-persons

Validation

To reproduce validation metrics:

  1. Download prepared annotations for imbd-clean / utk / adience / lagenda / fairface.
  2. Download checkpoint
  3. Run validation:
python3 eval_pretrained.py \
  --dataset_images /path/to/dataset/utk/images \
  --dataset_annotations /path/to/dataset/utk/annotation \
  --dataset_name utk \
  --split valid \
  --batch-size 512 \
  --checkpoint models/mivolo_imbd.pth.tar \
  --half \
  --with-persons \
  --device "cuda:0"

Supported dataset names: "utk", "imdb", "lagenda", "fairface", "adience".

License

Please, see here

Citing

If you use our models, code or dataset, we kindly request you to cite the following paper and give repository a ⭐

@article{mivolo2023,
   Author = {Maksim Kuprashevich and Irina Tolstykh},
   Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation},
   Year = {2023},
   Eprint = {arXiv:2307.04616},
}

About

MiVOLO age & gender transformer neural network


Languages

Language:Python 99.0%Language:Shell 1.0%