Creating an AI to recognize faces.

The Pipeline

Preparing the data using OpenCV by resizing the images in order to ensure consistency among images.
Detecting faces using DeepFace with OpenCV as a backend.
Extracting embeddings using FaceNet.
Using SciPy to find the cosine similarity between the test images and the training images.

OpenCV

Preparing Images
In order to ensure good performance, OpenCV is used to prepare the images.

Detector Backend
The main reason I used OpenCV as my DeepFace back end was because of speed. OpenCV is fast on a CPU, and doesn’t require a GPU, which was essential, as I was just running this algorithm on my laptop.

DeepFace

DeepFace was used for this project because of its compatibility with different face detection and embedding extraction models, which allowed me to test out many different back ends, before settling on OpenCV and FaceNet. It also normalizes the image by scaling the pixel values from -1 to 1 before feeding it into FaceNet, in order to ensure that FaceNets’ neurons don’t fire incorrectly and turn the embeddings into meaningless nonsense.

FaceNet

FaceNet is used for this project to extract the embeddings from the images of the faces. FaceNet maps each face to a vector of numbers, which represent its features. For this project, it uses ResNet, which is a residual neural network, for extracting these features. A residual neural network uses shortcut connections, which skip over blocks of 2-3 layers, allowing the input of a block to be added back to its output. This residual connection lets the network focus on learning the changes (the residual) needed, rather than the entire transformation from scratch.

SciPy

SciPy was used to find the cosine similarity between faces. After FaceNet was used to get the embeddings of the faces, SciPy was used to find their cosine similarity. Cosine similarity works by finding the cosine between two vectors, or in this case, embeddings. Imagine trying to find out, mathematically, how close the phrases “Hello, Johnathon!” and “Hello, World!” are. Let’s figure out how many times each word appear in each phrase.

	Hello, World!	Hello, Johnathon!
Hello	1	1
World	1	0
Johnathon	0	1

We can turn these phrases into vectors: (1, 1, 0) for “Hello, World!” and (1, 0, 1) for “Hello, Johnathon!”.We can then take the cosine similarity using this formula:

$\text{Cosine Similarity}(\mathbf{A}, \mathbf{B}) = \frac{ \sum_{i=1}^{n} A_i B_i }{ \sqrt{ \sum_{i=1}^{n} A_i^2 } \cdot \sqrt{ \sum_{i=1}^{n} B_i^2 } }$

This formula takes the dot product of the two vectors, and then divides it by their magnitudes, which leaves only the cosine. Let’s take the dot product of our two vectors:

$\mathbf{A(1, 1, 0)} \cdot \mathbf{B(1, 0, 1)} = (1 \times 1) + (1 \times 0) + (0 \times 1) = 1 + 0 + 0 = 1$

and then, let’s take the magnitude of the vectors using this formula: $\|\mathbf{V}\| = \sqrt{v_1^2 + v_2^2 + v_3^2}$

Substituting A gives us: $\|\mathbf{V}\| = \sqrt{1^2 + 1^2 + 0^2} = \sqrt{1 + 1 + 0} = \sqrt{2}$

and then substituting B gives us: $\|\mathbf{V}\| = \sqrt{1^2 + 0^2 + 1^2} = \sqrt{1 + 0 + 1} = \sqrt{2}$

Then, substituting the dot product of the two vectors and their two magnitudes into the original formula gives us: $\frac{1}{\sqrt{2} \cdot \sqrt{2}} = \frac{1}{2}$

which means that the cosine similarity is $\frac{1}{2}$ . In our example, this makes sense, as the phrases share one word (“Hello”) but diverge with the others (“World” and “Johnathon”).

Now, in the project I was working on, the cosine similarity was used to find the similarity between two faces’ embeddings. A match was considered for a cosine similarity below 0.2, since a lower cosine similarity indicates a better match in this context.