FPVT--Face Pyramid Vision Transformer

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to reduce the computations by compacting the feature maps. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model low-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed- Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods including CNNs, pure ViTs and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods.

Usage Instructions

1. Preparation

The code is mainly adopted from Face Transformer, Vision Transformer, and DeiT. Please install all dependencies pip3 install -r requirement.txt

pip3 install vit-pytorch

torch==1.8.1
torchvision==0.9.0+cu111
matplotlib==3.3.4
numpy==1.20.3
mxnet==1.8.0.post0
sklearn==0.0
scikit-learn==0.24.2
bcolz==1.2.1
pillow==8.2.0
ipython==7.22.0
scipy==1.6.3
opencv-python==4.5.1.48
tensorboardx==2.2
timm==0.3.2
ptflops==0.6.5
pyyaml==5.4.1
einops==0.3.0
pandas==1.3.1

2. Databases

You can download the training databases, faceScrub cleaned (version FaceScrub), and put it in folder 'Data'.

You can download the testing databases as follows and put them in folder 'eval'.

LFW: Baidu Netdisk(password: dfj0)
SLLFW: Baidu Netdisk(password: l1z6)
CALFW: Baidu Netdisk(password: vvqe)
CPLFW: Baidu Netdisk(password: jyp9)
TALFW: Baidu Netdisk(password: izrg)
CFP_FP: Baidu Netdisk(password: 4fem)--refer to Insightface
AGEDB: Baidu Netdisk(password: rlqf)--refer to Insightface

Citation

If you find this code useful for your research, please cite our work

@InProceedings{Khawar_BMVC22_FPVT,
      author = {Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood},
      title = {Face Pyramid Vision Transformer},
      booktitle = {Proceedings of the British Machine Vision Conference},
      year = {2022}
}
@inproceedings{islam2021face,
      title={Face Recognition Using Shallow Age-Invariant Data},
      author={Islam, Khawar and Lee, Sujin and Han, Dongil and Moon, Hyeonjoon},
      booktitle={2021 36th International Conference on Image and Vision Computing New Zealand (IVCNZ)},
      pages={1--6},
      year={2021},
      organization={IEEE}
}

Contact

If you find any problem in code and want to ask any question, please send us email khawarr dot islam at gmail dot com

Acknowledgment

This implementation of code is heavily borrows from Face Transformer and ViT Pytorch.

khawar-islam / fpvt