Facebook doesn’t devote thousands of people to its fledgling Graph Search tool, so it needs to make improvements with minimal amounts of engineer effort. In an article posted on the Facebook Engineering blog on Thursday,
Venky Iyer, an engineering manager from the entities team, and Eric Sun, a software engineer on the team, describe a few ways in which Facebook boosts the product step by step with the help of external data sets, user input and machine learning.
While users enter search terms into Graph Search all the time, Facebook needs to check categories such as schools, companies and other entities against a reliable and up-to-date database. It chose Wikipedia for this. But Wikipedia is constantly expanding, so engineers came up with a method for pulling new content in each week and using it to make new entity pages, which can show up in searches. To ensure that the pages are properly categorized and not duplicating what’s already on Facebook, engineers ask users to click a button if they can identify a duplicate. That feedback gets plugged in to Facebook’s machine-learning models to improve the accuracy of the system going forward.
The Wikipedia data also comes in handy by showing common terms across multiple languages. If a user wants to know how many of their friends are programmers, Facebook will interpret a friend’s claim to being a “programador” as an indication that he or she is a programmer. Facebook might also want to include people who employ alternative terms such as “coder.” Facebook can learn that that term is related to “programmer” if a user typed in the former as a previous job title and the latter as a current one.
And while Facebook has language gurus on staff, it uses the WordNet database to match search terms with existing entities.
Facebook uses machine learning to guess who works at certain companies or goes to certain schools. But just because users say they went to an Ivy League school doesn’t mean they actually did.
“After some early beta testing, we quickly realized that over a million people on the site claim to work at Facebook, and over a million people claim to have gone to Harvard University, which is highly unlikely to be true,” Iyer and Sun wrote. “… Regardless of the reason, we need to account for this in Graph Search to return useful results.”
This is where it makes use of the user email addresses it has on file. A confirmed official email address gives the user the best rating. But if that user’s friends say they share the same affiliation but don’t have confirmed official email addresses to back up their assertions, Facebook thinks it’s less likely the user actually has that affiliation after all.
But even with these systems in place for getting external sources to strengthen Graph Search, Facebook still needs to do some human work. The most popular entities on Facebook get special attention.
“To ensure that we provide high-quality suggestions, we use manual labeling for the largest nodes to figure out if (they) could legitimately be considered true schools and employers,” Iyer and Sun wrote. Human intervention stops there. “For the long tail, we generate scores based on the category of the Page, whether the node has been imported from Wikipedia, and the ratio of school/work connections to Page likes,” Iyer and Sun explained.
Facebook deserves points for improving Graph Search in these ways and others. At the same time, the social networking company has big steps ahead, from making Graph Search available in other languages to introducing the feature in mobile apps.
Facebook certainly has the hardware chops to support a large user base — we’ll be talking about Facebook’s infrastructure innovations with Jay Parikh, the company’s vice president of infrastructure engineering, at GigaOM’s Structure conference in San Francisco in a couple of weeks. Now it’s just a matter of impressing more users with the power and utility of Graph Search.