face-recognition face-recognition-python lda lda-algorithm lda-analysis pca pca-algorithm pca-analysis pca-lda face-recognition-lda face-recognition-pca

Face Recognition

A face recognition project using PCA and LDA algorithms.

This readme file is a summary of the project. For more details, please refer to the notebook.

Dataset

Our dataset for this project is the AT&T Face Database. The dataset is open-source and can be downloaded from Kaggle.
The dataset contains 400 images of 40 people. Each person has 10 images. The images are of size 92x112 pixels. The images are in grayscale. The images are in the form of a numpy array. The images are stored in the archive folder.

Data Splitting

We tried splitting the dataset into training and testing sets. one split was 50-50 and the other one is 70-30. The results are discussed inside the notebook.

Algorithms

Two algorithms were used in the facial recognition for the mentioned dataset which are: 1- PCA: Principal Component Analysis 2- LDA: Linear Discriminant Analysis

PCA

Principal Component Analysis (PCA) is a dimensionality reduction technique that is used to extract important features from high-dimensional datasets. PCA works by identifying the principal components of the data, which are linear combinations of the original features that capture the most variation in the data.

Pseudo Code

The pseudo code for the PCA:

    # computing the mean
    means=np.mean(training_set,axis=0).reshape(1,10304)
    # centering the data
    centered_training_set=training_set-means
    # computing the covariance matrix
    covariance_matrix=np.cov(centered_training_set.T,bias=True)
    # computing the eigen vectors & eigen values
    eigenvalues,eigenvectors=np.linalg.eigh(covariance_matrix)
    
    # sorting eigen vectors according to their corresponding eigen values
    positions = eigenvalues.argsort()[::-1]
    
    sorted_eigenvectors = (eigenvectors[:,positions])
    
    total = sum(eigenvalues)
    
    # getting the required pcs to reach a certain alpha
    r = 0
    
    current_sum = 0

    while current_sum/total < alpha:
        current_sum += eigenvalues[r]
        r += 1
    # getting the new space that the data will be projected to it 
    new_space = eigenvectors[:, :r]   

    return new_space

The first 2 Eigen-Faces

Using K-NN Classifier after PCA

KNN classifier is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output is determined by the majority of the classes of the k nearest neighbors.
The following graph shows the accuracy of face recognition at different values of k (1-3-5-7)

Comparison between different splitting ways

this table shows difference in accuracies

this table shows difference in number of principle components

Using PCA Variations

Randomized PCA

Randomized PCA is a faster and more memory-efficient version of PCA that uses randomized matrix approximations to estimate the principal components of the data. This approach involves sampling subsets of the data and computing the eigenvectors of the resulting covariance matrix, which can be done more efficiently than computing the eigenvectors of the full covariance matrix.
the randomised version of PCA operates in O(nd^2) + O(d^3) where d is the number of principle components, conventional PCA operates in O(np^2) + O(p^3) where n is the number of data points and p is the number of features. Therefore, it moves extremely quickly when d is significantly smaller than n.

Kernel PCA

Kernel PCA is a non-linear dimensionality reduction technique that uses a kernel function to map high-dimensional data into a lower-dimensional space. This allows it to capture non-linear relationships between variables that are not possible with linear PCA.
The time complexity of normal PCA is O(d^3), where d is the number of dimensions, while the time complexity of kernel PCA is O(n^3), where n is the number of data points. The computation of the kernel matrix is the most computationally expensive step in kernel PCA.
Kernel PCA may be more accurate than normal PCA for datasets with non-linear relationships between variables, as it can capture these relationships. However, kernel PCA is more prone to overfitting than normal PCA, and the choice of kernel function can greatly affect the performance of kernel PCA.

Accuracies for PCA variations

Kernel PCA, specifically using the radial basis function (RBF) kernel, may fail when the dataset has a large number of dimensions or when the number of data points is much larger than the number of dimensions. This is because the kernel matrix can become very large and computationally expensive to compute and manipulate. Additionally, the choice of kernel function and its parameters can greatly affect the performance of kernel PCA. In contrast, normal PCA may perform better in high-dimensional datasets or when the relationships between variables are linear, as it is designed to capture linear relationships between variables.
RBF kernel PCA uses the radial basis function kernel, which is a Gaussian function that measures the distance between data points in the original space. This kernel is useful for capturing non-linear relationships between variables that cannot be captured by linear PCA.While Polynomial kernel PCA uses a polynomial kernel, which is a power function that measures the dot product between data points in the original space raised to a certain power. This kernel is also useful for capturing non-linear relationships between variables, but is more sensitive to outliers and noise than the RBF kernel.

LDA

Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that is used to reduce the number of features in a dataset while maintaining the class separability. LDA is a supervised technique, meaning that it uses the class labels to perform the dimensionality reduction. LDA is a popular technique for dimensionality reduction in the field of pattern recognition and machine learning.

Pseudo Code

The pseudo code for the multi-class LDA is as follows:

    # Step 1: Compute the overall mean of the training set
    overall_mean = compute_mean(training_set)

    # Step 2: Compute the between-class scatter matrix and the within-class scatter matrix
    S_B = compute_between_class_scatter(training_set, overall_mean)
    S_W = compute_within_class_scatter(training_set)

    # Step 3: Compute the eigenvalues and eigenvectors of the generalized eigenvalue problem
    eigenvalues, eigenvectors = compute_generalized_eigen(S_B, S_W)

    # Step 4: Sort the eigenvalues and eigenvectors in descending order
    sorted_eigenvalues, sorted_eigenvectors = sort_eigen(eigenvalues, eigenvectors)

    # Step 5: Take only the dominant eigenvectors
    new_space = select_eigenvectors(sorted_eigenvectors)

    # Step 6: Return the dominant eigenvectors
    return new_space

Using K-NN Classifier after LDA

KNN classifier is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output is determined by the majority of the classes of the k nearest neighbors.
The following graph shows the accuracy of face recognition at different values of k (1-3-5-7)

Comparison between different splitting ways

We Tried splitting the data into training and test in 2 different ways:

50-50: This splitting resulted in a good accuracy reaching 95.5%
70-30: This splitting resulted in a slightly better accuracy reaching 95.83%

Using LDA Variations

We tried using different variations of LDA to see if we can get better results. The variations we tried are:

Shrinkage LDA

Shrinkage LDA (Linear Discriminant Analysis) is a variant of the standard LDA method that is used for classification and dimensionality reduction. The key difference between shrinkage LDA and normal LDA is that the former incorporates a regularization term that shrinks the sample covariance matrix towards a diagonal matrix.
This regularization is particularly useful when dealing with high-dimensional data, as it helps to overcome the small sample size problem by stabilizing the covariance estimates. Shrinkage LDA has been shown to outperform traditional LDA in terms of classification accuracy, especially when the number of features is much larger than the number of observations.
Another advantage of shrinkage LDA is that it can handle multicollinearity between the predictor variables, which can be a problem in standard LDA when the predictors are highly correlated. In summary, shrinkage LDA is a powerful tool for classification and dimensionality reduction that can improve the accuracy of LDA in high-dimensional and small sample size settings.

Accuracies for LDA variations

Comparing to Non-Faces Dataset

We compared the results of the PCA and LDA algorithms to the results of the same algorithms on a non-faces dataset. The non-faces dataset is the dataset. The results are discussed inside the notebook.

Faces VS Non-Faces Solutions__

Success & failure cases figure

For PCA:

For LDA

We will use 1 dominant eigenvector for LDA as we have 2 classes
Accuracy vs number of non-face images figure

For PCA

For LDA

As the number of non-face images increases, the accuracy of the classifier decreases, in contrast to what might be expected. This is because the number of points in the space increases, and the K-NN classifier is more likely to be confused by the noise in the data. This is a limitation of the classifier, and is not a problem with the data. The noise causes the space to be more complex and the gaps between the classes to be smaller and smaller which is more likely to cause confusion for the K-NN classifier.

Contributers

About

A face recognition project using PCA and LDA algorithms.

face-recognition face-recognition-python lda lda-algorithm lda-analysis pca pca-algorithm pca-analysis pca-lda face-recognition-lda face-recognition-pca

Languages

Language:Jupyter Notebook 100.0%

yousefkotp / Face-Recognition-Using-PCA-LDA

Face Recognition

Table of Contents

Dataset

Data Splitting

Algorithms

PCA

Pseudo Code

The first 2 Eigen-Faces

Using K-NN Classifier after PCA

Comparison between different splitting ways

Using PCA Variations

Randomized PCA

Kernel PCA

Accuracies for PCA variations

LDA

Pseudo Code

Using K-NN Classifier after LDA

Comparison between different splitting ways

Using LDA Variations

Shrinkage LDA

Accuracies for LDA variations

Comparing to Non-Faces Dataset

Faces VS Non-Faces Solutions__

Contributers

About

Languages