Facebook has published a research paper explaining a project called DeepFace that’s almost as good at putting names to faces as humans are. In fact, it might be better. The company claims its system, which is built using deep neural network, or deep learning, techniques, performed with 97.25 percent accuracy on a dataset commonly used to measure the effectiveness of facial recognition systems.
MIT Technology Review first reported on the DeepFace paper, which Facebook researchers are presenting at the IEEE Conference on Computer Vision and Pattern Recognition in June.
Deep learning is currently an area of investment for a number of web companies ranging from Pinterest to Netflix, although Facebook and Google have probably made the biggest news with their high-profile hires and acquisitions. It’s such a hot field because deep learning techniques are proving very effective at recognizing objects within images and analyzing language — two things that many web companies have by the petabyte — without much human supervision. Pinterest, for example, might want to tell advertisers what its users are pinning. Facebook might want to analyze the text of wall posts to learn more about what users are writing, or improve its network graph by identifying untagged friends in photos.
We’ll be talking a lot at Structure Data this week about how business of all types might go about using deep learning and artificial intelligence, in general, to improve their analytic efforts or build smarter, more automated products. Speakers on the topic include John Platt of Microsoft Research, Elliot Turner of AlchemyAPI, Stephen Gold of IBM’s Watson Group, Ben Medlock of SwiftKey and Tim Tuttle of Expect Labs.
Essentially, deep learning models work by recognizing, on multiple layers of neural networks, the many small features that make up an object or a piece of text, and then putting them together into a sort of map of the whole thing. Untrained, a system might see enough images of the same object to recognize that a certain collection of features comprises a thing or see enough text to learn how words are used. Once these systems are trained on labeled data (e.g., these are images of cats, these are images of Derrick Harris, these are negative movie reviews or these are examples of Communist propaganda) they can identify what they’re actually seeing.
The combination of huge datasets (like those the web companies have) and easily available computing resources (like those the web companies have) has helped advance the state of the art of artificial intelligence approaches like deep learning quite a bit over the past couple years. Facebook, for example, was able to train its DeepFace system on a dataset of 4.4 million images representing about 1,000 images each of 4,030 people. However, even smaller companies and university researchers are now able to obtain a respectable volume of training data by scraping the web for images and text, and by using cloud computing to access adequate processing power.
Facebook took things a step further with DeepFace, though, building a system that can straighten out angular shots to get a straight view of a face for improved accuracy.
Feature image courtesy of Shutterstock user phitpatbig.