Blog Post

We’re witnessing the rise of the graph in big data

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

GraphLab, a popular open source project dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, GraphLab Inc. GraphLab creator — and University of Washington machine learning professor — Carlos Guestrin will lead the new Seattle-based company, which has raised $6.75 million from Madrona Venture Group and NEA.

Graph analysis is among the hottest techniques around for making sense of large datasets, primarily by determining how tightly different data points are related or how similar they are. The term “graph” came into the broader lexicon along with social networks, which built social graphs to assess the relationships among their millions of users, but the technique has much broader uses.

My LinkedIn social graph
My LinkedIn social graph

Guestrin said GraphLab’s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks as potential applications. We’ve covered graphs as the analytical model of choice for everything from content recommendation to tracking lab work in genomics. Really, though — especially when combined with machine learning — graph analysis can be applied to anything where there’s too much data for a person to possibly analyze the relationships between every point.

One of Ayasdi's graph-like data maps
One of Ayasdi’s graph-like data maps

Google also famously uses a graph-processing system called Pregel as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open-source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project’s user base.

Among those other projects are graph databases such as Giraph (an open source, Hadoop-based Pregel clone developed at Facebook) and Neo4j (which also has a commercial arm, called Neo Technology), as well as Twitter’s Cassovary and fellow University of Washington project Grappa. Guestrin said GraphLab can work with most of them, particularly if they’re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.

As for when we’ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he’s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.

The bigger question to come out of all this graph activity, though, is how big a market we’ll ultimately see for graph-analysis or any other specific technique. As companies get more comfortable with big data from a technical standpoint, they’re getting more interested in the different types of analysis it allows for too. This is evidenced by the quest to make Hadoop support myriad processing frameworks aside from MapReduce.

We already have a handful of commercial graph products on the market — including an industrial grade one called YarcData from supercomputer maker Cray — but how many will there eventually be? And if graph analysis is all the rage right now, what comes next?

3 Responses to “We’re witnessing the rise of the graph in big data”

  1. Intent Data

    There are more and more applications released every day as this market increases. Understanding who the players are is becoming more of a challenge, particularly as small startups are probably the greatest innovators. We are covering the impact of the Social and Interest Graphs within the Big Data and Analytics sphere at:

  2. cjpearlman

    Graph database concepts were new to me before I came on board at but we’ve pioneered the use of a graph data structure on top of Cassandra to solve many of the perplexing customer challenges I saw at my previous companies in the MDM space. At Reltio, we’re rolling out Reltio big data Applications, Reltio SaaS MDM, and the Reltio Convergence Hub, all built on top of our graph data data management structure. The benefits to all of these user end-points are multi-fold and truly represent the future trend.