Face Recognition

This is a repositiry for showcase usage of Inception Resnet (V1), pretrained on VGGFace2. Implementation from Tim Esler's github repo

And also includes an implementation of MTCNN for face detection, fastest from the available.

Getting Started

Install:

# With pip:
pip install facenet-pytorch

# or clone this repo, removing the '-' to allow python imports:
git clone https://github.com/timesler/facenet-pytorch.git facenet_pytorch

# or use a docker container (see https://github.com/timesler/docker-jupyter-dl-gpu):
docker run -it --rm timesler/jupyter-dl-gpu pip install facenet-pytorch && ipython

In python, import facenet-pytorch and instantiate models:

from facenet_pytorch import MTCNN, InceptionResnetV1

# If required, create a face detection pipeline using MTCNN:
mtcnn = MTCNN(image_size=<image_size>, margin=<margin>)

# Create an inception resnet (in eval mode):
resnet = InceptionResnetV1(pretrained='vggface2').eval()

Process an image:

from PIL import Image

img = Image.open(<image path>)

# Get cropped and prewhitened image tensor
img_cropped = mtcnn(img, save_path=<optional save path>)

# Calculate embedding (unsqueeze to add batch dimension)
img_embedding = resnet(img_cropped.unsqueeze(0))

# Or, if using for VGGFace2 classification
resnet.classify = True
img_probs = resnet(img_cropped.unsqueeze(0))

See help(MTCNN) and help(InceptionResnetV1) for usage and implementation details.

Example notebooks

This notebook demonstrates the use of packages:

facenet-pytorch
mtcnn
sklearn
albumentations

Complete detection and recognition pipeline

In this notebook was introduced a complete example pipeline utilizing datasets, dataloaders, basic data augmentation, training classifier on top of resnets embeddings and face tracking in video streams.

Prerequisites

In order to run the example code in google colab you need to prepare separate folders for images dataset.

Here is a link for project structure. When you download project on your google drive, it will have such path: /content/drive/My Drive/Colab Notebooks/facenet/

facenet
    +-- facenet.ipynb
    +-- data
    |   +-- test_images
        |   +-- person1
            |   +-- 1.png
            |   +-- 2.png
        |   +-- person2
            |   +-- 1.png
            |   +-- 2.png
    |   +-- train_images
        |   +-- person1
            |   +-- 1.png
            |   +-- 2.png
        |   +-- person2
            |   +-- 1.png
            |   +-- 2.png
    |   +-- test_images_cropped
        |   +-- person1
            |   +-- 1.png
            |   +-- 2.png
        |   +-- person2
            |   +-- 1.png
            |   +-- 2.png 
    |   +-- train_images_cropped
        |   +-- person1
            |   +-- 1.png
            |   +-- 2.png
        |   +-- person2
            |   +-- 1.png
            |   +-- 2.png

Note, <images folder>_cropped folders are automatically generated in code. All images should be (.png, jpeg, jpg) and converted to RGB automatically.

Then, after preparing test_images and train_images, we can easily apply face detection using MTCNN and save in <images folder>_cropped.

Following all the above, all cropped images can be ran through Inception Resnet model in order to get embeddings or probabilities. In our case, we are getting embeddings to train on them SVM classifier from sklearn (best parameters were found by SearchGrid and saved in data folder as svm.sav). To make our classifier more stable, some augmentations were applied(you can observe them in notebook).

All embeddings from images were saved in data folder as trainEmbeds.npz and testEmbeds.npz.

Face tracking in video streams

Here we may see some obstacles, such as wrong-labelled classes and narrow-mindedness of our model(classifier predicts the most probable face among all known/trained, so it lacks of ability to distinguish known from unknown person, right 3 people were not in train dataset)

More examples

Experiments

Original test dataset

Aligned images, preprocessed by MTNN detector.

During training we had original 79-81 images. After getting embeddings each of size 512 by runnig through Inception Resnet model, we may observe:

Distances between embedding vectors

Then we used a method called t-distributed Stochastic Neighbor Embedding(tSNE), which is especially good at visualizing high-dimensional data.

As we see, some points(vectors) will be hard to distinguish.

Authors

Bolotov Heorgii - Initial work - Heorh
Zikratyi Dmytro - Initial work - shooterdimon
Moroz Denis - Initial work - HPMortys
Trishchuk Denis - Initial work - krissayrose

License

This project is licensed under the MIT License - see the LICENSE file for details

References

Tim Esler's facenet-pytorch repo: https://github.com/timesler/facenet-pytorch
F. Schroff, D. Kalenichenko, J. Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering, arXiv:1503.03832, 2015. PDF
K. Zhang, Z. Zhang, Z. Li and Y. Qiao. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks, IEEE Signal Processing Letters, 2016. PDF

heorhii-bolotov / facenet