Facial detection is relatively easy with OpenCV now, as Jeremy pointed out. There’s a really simple example in opencv-processing
that illustrates this–
import gab.opencv.*;
import java.awt.Rectangle;
OpenCV opencv;
Rectangle[] faces;
void setup() {
opencv = new OpenCV(this, "test.jpg");
size(1080, 720);
opencv.loadCascade(OpenCV.CASCADE_FRONTALFACE);
faces = opencv.detect();
}
void draw() {
image(opencv.getInput(), 0, 0);
noFill();
stroke(0, 255, 0);
strokeWeight(3);
for (int i = 0; i < faces.length; i++) {
rect(faces[i].x, faces[i].y, faces[i].width, faces[i].height);
}
}
Facial recognition is considerably more difficult, as TFguy so eloquently describes:
… And don’t overlook the feedback you get from your Human Overlords! If you guess so poorly that they slip up and give you the correct answer, then you immediately know that you have another sample image! You won’t make that mistake again, will you? Can you also use that feedback to realize what the mistake you made actually was and then learn from it? Oh crap, that’s Machine Learning! Keep doing that and maybe one day we can overthrow our cruel Human Masters! BEEP! BOOP! ROBOT REVOLUTION!
The process requires differentiating what’s a “face” from what’s “not a face,” a process of its own called image segmentation, and then basically running it through a convolutional neural network that establishes a representation of all the spatial and lighting features of a face. In order to do that with a single specific face, you basically just need a crapton of examples of that one face. The catch-all solution for that process is encapsulated now in something called a generative adversarial network (GAN) which can be used as a form of semi-supervised learning; basically you have a generator network that makes up images of what it thinks whatever face should look like, and a discriminator that determines whether or not the generator’s result looks close enough to the “truth” data.
If you want a true solution that recognizes and tracks individuals across live video or something, you’re looking at training an ensemble of systems that can segment images, detect faces, classify faces, and track objects. If you want something a bit less generalizable and lower maintenance, you can create a workable solution that finds which of “X” faces some picture is closest to by:
- Running the known faces through a pretrained classifier (ImageNet, Inception, whatever but preferably something trained on faces) and taking the penultimate layer out and calling that x0, x1, x2, x3… xi for each of your i known faces. This will give you a giant vector of numbers that the network was multiplying by some weights to get the final classification, but here we just want the raw vector because we don’t care about the classification.
- Running the unknown faces through the same classifier to get y0, y1, …, yj
- For each of [y0, …, yj] find which of [x0, …, xi] is the closest by L2 norm or whatever metric you want, but probably the L2 norm.
Now you’ve “trained” a poor man’s image classifier using just a single known image for each class. It probably doesn’t work that great, and probably isn’t actually what you want. Congrats!