Webscale companies such as Facebook, Google, and Netflix have come clean about how they use graph processing to quickly reveal the seemingly disparate connections among people, places and things. And more use cases for graph databases emerged Monday at the 2013 GraphLab workshop in San Francisco.
But even though it became clearer what’s possible when data is organized in graphs — better e-commerce and Twitter follower recommendations and lighter infrastructure usage, for example — some speakers pointed to the need for graphs and machine learning to become easier to implement.
Graphs at scale at Twitter and Walmart
Twitter’s Who to Follow tool is a fine example of a product benefiting from a graph model for data. Who to Follow depends on the FlockDB graph database and the Cassovary in-memory graph-processing engine Twitter constructed in-house and then released to everyone under an Apache License. The product mines existing connections among users, shared interests and other data in order to makes its recommendations with data in a graph that can run inside the memory of a single server.
Take it as proof that the graph model can provide advantages over a more traditional relational model for certain kinds of applications. The system’s success over the past three years demonstrates that it’s not only possible but preferable for a graph to run in a single instance of memory, said Pankaj Gupta, head of the personalization and recommender systems group at Twitter.
Lei Tang, a data scientist at @WalmartLabs, talked about how he’s been working on drawing on lots of data sources to recommend products to website users that they might actually want to buy.
A smart recommendation system ought to shift in response to incoming data on, say, a user’s page views and purchases, he said. This is where clustering of products can be wise. So while a user might view a bunch of televisions before ultimately buying one, the cluster of television products within the larger set of products the system can recommend should be set aside as soon as the purchase happens. Recommend a television with big discounts after the purchase, Tang said, and “users are really pissed off.”
Also, in the domain of e-commerce it’s important to add nuance into recommendations. For example, a good recommendation system would suggest to users a primary product such as an iPhone before showing accessories such as a case or earphones. So companies should make those page views and other data count and focus on granular product categories in order to maximize purchases through recommendations.
And these sorts of fine-grained tweaks need to be made quickly for millions of users, so the system can’t be too computationally intensive. Tang and his colleagues appear to have come up with a scalable system that meets these requirements, although he said there’s still room for improvement.
Coming soon to a server near you?
More use cases emerging for graphs could motivate more companies to try out the graph model. And that means more business opportunities. The namesake of the GraphLab workshop, the GraphLab open-source graph project with roots at the University of Washington, spun off a startup just a couple of months ago. Now another project from the university, Grappa, is spinning off a startup, too, said Mark Oskin, a professor at there. Grappa aims to ensure that performance stays strong as graphs running in memory on a whole bunch of commodity servers, while at the same time making the most of network bandwidth.
GraphLab, for its part, announced the release of version 2.2 of its software, which makes it easier for developers to write machine-learning programs, said its founder and CEO, Carlos Guestrin.
While webscale companies are already reaping the benefits of storing data in graph models and the market shows room for growth, adoption across enterprises might take some years yet. Dr. Theodore Willke of Intel Labs is a believer in the graph — he has worked on the GraphBuilder system for making graphs out of data in Hadoop — but he thinks his contemporaries pushing graph analytics are far ahead of the rest of the world in terms of getting people on board. He, too, is “guilty of being in this rocketship going at, like, warp nine,” he said, with innovators having huge clusters to work on and intense performance needs.
“Most of the industry is, like, miles behind you,” Willke said. Engineers at companies are still getting on board with doing MapReduce jobs, he said. Now it’s important to articulate the clear business uses of graphs that demonstrate their value. To help with that, Willke said he intends to focus his efforts on making graphs easy to use and integrate with other computational models.
Feature image courtesy of Flickr user yaph.