<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; graph databases</title>
	<atom:link href="http://gigaom.com/tag/graph-databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 01:39:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; graph databases</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Facebook builds a database benchmark for a graph-powered world</title>
		<link>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/</link>
		<comments>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/#comments</comments>
		<pubDate>Mon, 01 Apr 2013 22:28:39 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[LinkBench]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626218</guid>
		<description><![CDATA[Facebook has built a new open source tool for benchmarking graph databases, called LinkBench. And although the chances are your infrastructure and workloads look nothing like Facebook's, the good news is LinkBench was built with configurability in mind.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626218&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you&#8217;re doing any sort of social-media application, you might want to take note of what Facebook just built. The company has <a href="http://www.facebook.com/notes/facebook-engineering/linkbench-a-database-benchmark-for-the-social-graph/10151391496443920">created a benchmarking tool called LinkBench</a> that measures the performance of databases tasked with serving graph-structured data, which, presumably, is the lifeblood of every startup around that&#8217;s concerned with who&#8217;s connected to whom.</p>
<p>Although, of all LinkBench&#8217;s features &#8212; and you can read all about them in a Facebook Engineer wall post from Monday morning &#8212; probably the biggest is <a href="https://github.com/facebook/linkbench">that it&#8217;s open source</a> and built to be extensible. One of the biggest problems with benchmarks overall is that they rarely align with actual production workloads inside the companies that are supposed to care about them. In this case, for example, a benchmark for measuring the performance of <a href="http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/">Facebook&#8217;s massive MySQL</a>+memcached+<a href="http://www.facebook.com/note.php?note_id=388112370932">Flashcache</a> database architecture against its massive social graph and transaction activity would be all but worthless unless someone was just planning to rebuild Facebook.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg"><img  alt="linkbench copy" src="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg?w=708&#038;h=610" width="708" height="610" class="aligncenter size-large wp-image-626252" /></a></p>
<p>I&#8217;ve written in the past that <a href="http://gigaom.com/2012/07/26/why-crowdsourced-computing-benchmarks-are-the-future/">perhaps crowdsourced benchmarks are the wave of the futur</a>e: essentially a compiled set of statistics and best practices as more companies test different database (or Hadoop) technologies on different hardware setups against different workloads and publish the results. Everything will of course vary by the exact details within any given environment, but it would be a good way to get a sense of how a particular stack might, or perhaps should, fare.</p>
<p>But an open source benchmark tuned for a specific use case &#8212; social graphs &#8212; by probably the world&#8217;s foremost expert on that use case is interesting, too. Anyone else trying to serve data from their own social graphs can benefit from some of LinkBench&#8217;s more-prominent features, such as its ability to generate &#8220;large synthetic social graphs,&#8221; while tuning it to the specifics of their own infrastructure. After all, it might be that your app has different requirement around reading versus writing data, and <a href="http://gigaom.com/2011/07/21/is-stonebraker-right-why-sql-isnt-the-choice-du-jour-for-many-apps/">it&#8217;s very possible you&#8217;re not using MySQL</a>, either.</p>
<p>Or maybe you are using MySQL and want to see how a newer database technology might handle your graph workload. That, by the way, is one of the reasons Facebook built LinkBench, according to this post.</p>
<p>At any rate, the social web is all about graphs, and database performance really matters for anyone trying to build a service that stays online and provides a pleasant user experience. Say what you want about Facebook, but its services perform, so the bar is set high for anyone trying to dethrone it or at least to build something than can attract an equally large and devout following.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626218&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=642266"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=642266" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2010/10/is-the-future-of-enterprise-completely-open-source/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Is the Future of Enterprise Completely Open Source?</a></li><li><a href="http://pro.gigaom.com/2012/11/breaking-down-barriers-and-reducing-cycle-times-with-devops-and-continuous-delivery/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">How devops can reduce cycle times</a></li><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/graph-copy-e1364854754198.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/graph-copy-e1364854754198.jpg?w=150" medium="image">
			<media:title type="html">graph copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg?w=708" medium="image">
			<media:title type="html">linkbench copy</media:title>
		</media:content>
	</item>
		<item>
		<title>How researchers are fighting lung cancer using PageRank</title>
		<link>http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/</link>
		<comments>http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 18:45:27 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cancer]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[medical research]]></category>
		<category><![CDATA[pagerank]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=624307</guid>
		<description><![CDATA[Medical researchers are using a mathematical process similar to Google PageRank in order to identify organs most likely to spread lung cancer throughout the human body.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=624307&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Google&#8217;s PageRank algorithm has forever changed the way we access information by putting the best stuff first, and now researchers are <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0034637">using the same mathematical models that Google uses to fight the spread of lung cancer</a> within the human body. While there&#8217;s no &#8220;best&#8221; when it comes cancer cells, the aim is to identify tumors more likely to metastasize and then hit them with targeted treatment before the cells have a chance to spread.</p>
<p>The researchers &#8212; who come from the University of Southern California, Scripps Clinic, the Scripps Research Institute, the University of California, San Diego Moores Cancer Center and Memorial Sloan-Kettering &#8212; combined autopsy data from 163 cancer cases (all from before the advent of radiation therapy in order to analyze the natural spread) with applied mathematics in order to carry out their study. What they found,<a href="http://www.scripps.edu/news/press/2013/20130325lung_cancer.html"> according to a press release about the research</a> is that</p>
<blockquote id="quote-metastatic-lung-canc"><p>metastatic lung cancer does not progress in a single direction from primary tumor site to distant locations, which has been the traditional medical view. Instead &#8230; cancer cell movement around the body likely occurs in more than one direction at a time.</p></blockquote>
<div id="attachment_624447" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g001.png"><img  alt="How cancer cells spread. Source: PLOS One" src="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g001.png?w=300&#038;h=297" width="300" height="297" class="size-medium wp-image-624447" /></a><p class="wp-caption-text">How cancer cells spread. Source: PLOS One</p></div>
<p>Moreover, they found certain organs tend to spread cancer cells more aggressively, while others tend to act as sponges for cancer cells. These sponge organs might still grow tumors, they just don&#8217;t disperse the cells.</p>
<h2 id="the-pagerank-analogy">The PageRank analogy</h2>
<p>The mathematics involved here &#8212; called Markov chain models &#8212; are <a href="http://en.wikipedia.org/wiki/PageRank">similar to what Google uses</a> to determine what web pages are the highest-quality for any given search query. Only whereas Google uses the number and quality of links to determine the probability of a web surfer landing on any given page, these researchers are trying to predict the PageRank of tumors, if you will. So, generally speaking, a kidney would likely have a higher PageRank than a liver because the kidney is more likely to spread cancer cells throughout the body (or, in web-search terms, generate a lot of links to itself).</p>
<div id="attachment_624441" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g009.png"><img  alt="The network path of cancer cells from lung to liver. Source: PLOS One" src="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g009.png?w=708&#038;h=596" width="708" height="596" class="size-large wp-image-624441" /></a><p class="wp-caption-text">The network path of cancer cells from lung to liver. Source: PLOS One</p></div>
<p>As data volumes proliferate and relationships between data points become more complex, Markov models are actually becoming pretty popular. Netflix <a href="http://gigaom.com/2012/06/14/netflix-analyzes-a-lot-of-data-about-your-viewing-habits/">uses them in order to predict the movies</a> users will want to watch next.</p>
<p>The weighted connections between various states or web pages or whatever someone is ranking are <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">often expressed as the nodes and edges of a graph</a>. Graphs, of course, have become part of the everyday web lexicon thanks to the various <a href="http://gigaom.com/2013/03/14/facebook-tweaks-its-algorithms-to-improve-graph-search-comment-search-coming/">social graphs</a> and <a href="http://gigaom.com/2012/03/15/the-personalized-web-is-just-an-interest-graph-away/">interest graphs</a> that analyze who we&#8217;re connected to (and how) and the types of topics we browse online.</p>
<h2 id="the-web-as-a-data-science-prov">The web as a data science proving ground</h2>
<p>So in the end, perhaps, the most-important contribution of the worldwide web won&#8217;t be the revolution in terms of how we access information, but the web&#8217;s function as a proving ground for advanced statistical methods starring very large and complex data sets like those found in the medical world. Already, for example, another group of medical researchers has used a Markov variant in order to <a href="http://gigaom.com/2013/02/11/researchers-say-ai-prescribes-better-treatment-than-doctors/">create a model they think can prescribe better treatment plans</a> because it analyzes the costs and patient outcomes usually associated with a given treatment for a given symptom.</p>
<div id="attachment_624480" class="wp-caption alignleft" style="width: 307px"><a href="http://gigaom2.files.wordpress.com/2013/03/cholera-copy.jpg"><img  alt="Tracking a cholera outbreak across a river network. Source: Physical Review Letters" src="http://gigaom2.files.wordpress.com/2013/03/cholera-copy.jpg?w=708"   class="size-full wp-image-624480" /></a><p class="wp-caption-text">Tracking a cholera outbreak across a river network. Source: Physical Review Letters</p></div>
<p>Last year, a group of Swiss researchers developed an algorithm that, having access to a relatively small amount of data, <a href="http://gigaom.com/2012/08/13/an-algorithm-for-tracking-viruses-and-twitter-rumors-to-their-source/">can track anything from Twitter rumors to disease outbreaks</a> back to their source. A company called Syapse <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">uses the graph structure to chart the relationships</a> among words across different medical specialties.</p>
<p>One would also be remiss in ignoring the computing and data-storage innovation spurred by the web that has <a href="http://gigaom.com/2012/11/27/why-data-is-the-key-to-better-medicine-and-maybe-a-cure-for-cancer/">improved our ability to handle massive amounts of genetic and other data</a>. As the lung cancer researchers <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0034637">explain in their paper</a>:</p>
<blockquote id="quote-one-of-the-strengths2"><p>One of the strengths of such a statistical approach is that we need not offer specific biomechanical, genetic, or biochemical reasons for the spread from one site to another, those reasons presumably will become available through more research on the interactions between CTCs and their microenvironment. We [have created] a quantitative and computational framework for the seed-and-soil hypothesis as an ensemble based first step, [that] then can be further refined primarily by using larger, better, and more targeted databases such as ones that focus on specific genotypes or phenotypes, or by more refined modeling of the correlations between the trapping of a CTC at a specific site, and the probability of secondary tumor growth at that location.</p></blockquote>
<p>The long story short is that the more data we have and the easier we can analyze and map it, the better we can treat &#8212; and perhaps even cure &#8212; cancer and other complicated diseases.</p>
<p><em>Feature image is a network map of how lung cancer spreads between organs, where each numbered node correlates with a specific organ.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=624307&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=593839"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=593839" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=624307+how-researchers-are-fighting-lung-cancer-using-pagerank&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/26/how-researchers-are-fighting-lung-cancer-using-pagerank/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g003.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g003.png?w=150" medium="image">
			<media:title type="html">journal.pone.0034637.g003</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g001.png?w=300" medium="image">
			<media:title type="html">How cancer cells spread. Source: PLOS One</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/journal-pone-0034637-g009.png?w=708" medium="image">
			<media:title type="html">The network path of cancer cells from lung to liver. Source: PLOS One</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/cholera-copy.jpg" medium="image">
			<media:title type="html">Tracking a cholera outbreak across a river network. Source: Physical Review Letters</media:title>
		</media:content>
	</item>
		<item>
		<title>Facebook&#8217;s Graph Search mastermind shares a few tech secrets</title>
		<link>http://gigaom.com/2013/02/15/facebooks-graph-search-mastermind-shares-a-few-more-secrets/</link>
		<comments>http://gigaom.com/2013/02/15/facebooks-graph-search-mastermind-shares-a-few-more-secrets/#comments</comments>
		<pubDate>Fri, 15 Feb 2013 20:32:08 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[Graph Search]]></category>
		<category><![CDATA[Social graph]]></category>
		<category><![CDATA[social-data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=611223</guid>
		<description><![CDATA[Facebook Director of Engineering Lars Rasmussen held an Ask Me Anything session of Reddit on Thursday to talk about Graph Search. Here's what he had to say about the infrastructure underlying it.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=611223&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A month after launching its vaunted Graph Search feature to much fanfare, Facebook is finally opening up a bit about how, exactly, it works. The product’s primary architect, Lars Rasmussen,<a href="http://www.reddit.com/r/IAmA/comments/18jb6d/i_am_the_pointyhaired_engineering_director_for/"> took to Reddit yesterday in an Ask Me Anything session</a> during which he elaborated (<a href="http://gigaom.com/2013/01/15/a-really-tiny-explanation-of-how-facebooks-graph-search-works/">beyond what Om reported last month</a>) on how Graph Search is built.</p>
<p>Of course, this being a Reddit discussion, Rasmussen answers a bunch of questions about the history of graph search, its privacy issues, his role with building Google Maps (and Wave) and his walks with Mark Zuckerberg. But here are some of the more-informative excerpts about the architecture itself.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/gs4.jpg"><img alt="gs4" src="http://gigaom2.files.wordpress.com/2013/02/gs4.jpg?w=708&#038;h=393" width="708" height="393" class="aligncenter size-large wp-image-611229"></a></p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/gs2.jpg"><img alt="gs2" src="http://gigaom2.files.wordpress.com/2013/02/gs2.jpg?w=708&#038;h=320" width="708" height="320" class="aligncenter size-large wp-image-611233"></a></p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/gs1.jpg"><img alt="gs1" src="http://gigaom2.files.wordpress.com/2013/02/gs1.jpg?w=708&#038;h=213" width="708" height="213" class="aligncenter size-large wp-image-611232"></a></p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/gs3.jpg"><img alt="gs3" src="http://gigaom2.files.wordpress.com/2013/02/gs3.jpg?w=708&#038;h=269" width="708" height="269" class="aligncenter size-large wp-image-611235"></a></p>
<p>I should point out, too, that we’ll be talking a bit about graph processing and graph databases at our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=611223+facebooks-graph-search-mastermind-shares-a-few-more-secrets&amp;utm_content=dharrisstructure">Structure: Data</a> conference next month, too. Graphs, as it turns out, are a great way to storing, processing and presenting <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">a lot of data that has nothing to do with social connections</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=611223&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=105896"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=105896" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611223+facebooks-graph-search-mastermind-shares-a-few-more-secrets&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/10-ways-big-data-changes-everything-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611223+facebooks-graph-search-mastermind-shares-a-few-more-secrets&utm_content=dharrisstructure">10 ways big data changes everything</a></li><li><a href="http://pro.gigaom.com/2010/12/9-companies-that-pushed-the-infrastructure-discussion-in-2010/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611223+facebooks-graph-search-mastermind-shares-a-few-more-secrets&utm_content=dharrisstructure">9 Companies that Pushed the Infrastructure Discussion in 2010</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611223+facebooks-graph-search-mastermind-shares-a-few-more-secrets&utm_content=dharrisstructure">How search can unlock the power of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/15/facebooks-graph-search-mastermind-shares-a-few-more-secrets/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/fbgs1-e1360960162935.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/fbgs1-e1360960162935.jpg?w=150" medium="image">
			<media:title type="html">fbgs</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gs4.jpg?w=708" medium="image">
			<media:title type="html">gs4</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gs2.jpg?w=708" medium="image">
			<media:title type="html">gs2</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gs1.jpg?w=708" medium="image">
			<media:title type="html">gs1</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gs3.jpg?w=708" medium="image">
			<media:title type="html">gs3</media:title>
		</media:content>
	</item>
		<item>
		<title>Biotech startup Syapse wants to be Salesforce.com for our genomes</title>
		<link>http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/</link>
		<comments>http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/#comments</comments>
		<pubDate>Tue, 22 Jan 2013 15:45:32 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data. graph processing]]></category>
		<category><![CDATA[biotechnology]]></category>
		<category><![CDATA[Genomics]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[omics]]></category>
		<category><![CDATA[Syapse]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=603080</guid>
		<description><![CDATA[A startup called Syapse is trying to bring the world of "omics" -- the study of all our genomes, biomes, proteomes and other "omes" -- under control with a new data management platform based on some of the general techniques that also power Facebook's Graph Search.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=603080&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Think about Facebook’s <a href="http://gigaom.com/2013/01/15/a-really-tiny-explanation-of-how-facebooks-graph-search-works/">new Graph Search feature</a> — only infinitely more complex — and you have a rough understanding of what Palo Alto, Calif.-based startup <a href="http://www.syapse.com/">Syapse</a> is trying to do. The company, which on Tuesday announced $3 million in Series A led by The Social+Capital Partnership (and previously raised a $1.6 million seed round), is building a data-management platform designed to let researchers and physicians easily pore through mountains of complicated molecular data in order to better diagnose a whole range of potential illnesses.</p>
<p>But to understand how Syapse works, you have to understand the problem it’s trying to solve. A condensed version of the situation is this: Sequencing genomes, proteomes, biomes and other microscopic, but very important, biological players generates a lot of data. However, we’re not just talking about the <a href="http://gigaom.com/2012/01/23/as-genomics-pushes-big-data-limits-cloud-could-save-the-day/">terabytes of data that a fully sequenced genome</a> (or perhaps the <a href="http://www.newyorker.com/reporting/2012/10/22/121022fa_fact_specter">tens of thousands sequenced gut bacteria</a>, which can change composition hourly) will produce, but also patient data (e.g., name, date of birth, smoker or non-smoker, etc.) and process data (i.e., everything that happens from the time a lab gets a sample to the time a doctor gets a report on his desk).</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/shutterstock_67144993.jpg"><img alt="lab worker" src="http://gigaom2.files.wordpress.com/2013/01/shutterstock_67144993.jpg?w=300&#038;h=199" width="300" height="199" class="alignleft size-medium wp-image-603147"></a>The complexity and perpetually changing nature of both the field of <a href="http://en.wikipedia.org/wiki/Omics">“omics”</a> as it’s called, and the data itself, further complicates things. According to Syapse Co-founder and President Jonathan Hirsch, diagnostics labs and workers are always using new and different processes trying to optimally extract, tag and analyze samples. Furthermore, expert knowledge of what any particular genetic or other signature means is always changing (for example, Hirsch said, we only really understand about 1 percent of the human genome), as are the <a href="http://www.openclinical.org/ontologies.html">ontologies</a> that lab workers, researchers and physician specialists use as their particular fields evolve.</p>
<p>“There is basically a wholes set of measurements that go beyond just sequencing the genome,” he explained. Analyzing genomes, proteomes and anything else is “like a very, very complicated recipe” that involves much more than swabbing someone’s cheek and getting back a comprehensive, understandable report. Syapse doesn’t actually do any of the sequencing work (like a <a href="http://gigaom.com/2011/10/12/dnanexus-cloudant-biotech-deals/">DNAnexus</a> or <a href="http://gigaom.com/2012/04/30/straight-outta-stanford-bina-wants-to-remake-genome-analysis/">Bina Technologies</a> does,) but just captures the metadata from those lab processes and connects to those hefty sequenced data via an API so the platform has access to everything it needs.</p>
<h2 id="organizing-complex-data-requir">Organizing complex data requires a graph</h2>
<p>Using semantic-analysis and graph-processing techniques, Syapse thinks it can bring the <a href="http://www.nytimes.com/2012/06/19/science/studies-of-human-microbiome-yield-new-insights.html?pagewanted=all&amp;_r=1">world of “omics”</a> under control. Although it’s currently working with research centers that analyze the data in order to better hone their processes, Hirsch expects the company will eventually make most of its money from doctors and hospitals using Syapse to help better diagnose their patients. “[We're] trying to fill the gap and be the company that cracks the physician side of this,” he said.</p>
<p>This is where the Graph Search comparison comes into play. The Syapse platform is continuously updated with the latest ontologies from various fields and the changing meanings of the metadata associated with the various lab processes. All this information is stored based on its relationship to other pieces, and semantic analysis means the Syapse software knows that Term X in one field might actually mean Term Y in another.</p>
<p>Syapse has essentially created a “huge <a href="http://gigaom.com/2012/08/08/for-google-keeping-search-relevant-means-baking-big-data-into-everything/">knowledge graph</a>” of clinical, diagnosis and omics data, Hirsch explained, and doctors and researchers can mine it using whatever terms they use in their daily lives. They can easily search, for example, by patients they’ve treated for breast cancer whose genes showed certain specific markers and were processed using particular techniques in the lab in order to find connections among them.</p>
<p>Syapse Co-founder and CEO Glenn Winokur — an admitted “IT guy” compared with his biotech-focused partners — likes to put the platform’s promise in the terms of business software. ”Think of this entire workflow as similar to a sales or marketing workflow,” he said, adding that Syapse is trying to make mining omics data as simple for its users as Salesforce.com makes CRM for its users.</p>
<p>That’s probably a good analogy for selling the software to hospital administrators who might be more concerned with budgets than with big data technology. As we’ll discuss in more detail at our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=603080+biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes&amp;utm_content=dharrisstructure">Structure: Data conference</a> on March 20-21, business people are increasingly concerned with using data to make better decisions, but they need applications that make it easier and faster to find stuff out than is possible with many open source packages targeting engineers and statisticians. If Syapse can deliver on this promise for making sense of our complex biological systems, it could make a big difference.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-168430p1.html">Shutterstock user kentoh</a>; lab image courtesy of <a href="http://www.shutterstock.com/gallery-332422p1.html">Shutterstock user VILevi</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=603080&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=685565"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=685565" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=603080+biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=603080+biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/aws-storage-gateway-jolts-cloud-storage-ecosystem/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=603080+biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes&utm_content=dharrisstructure">AWS Storage Gateway jolts cloud-storage ecosystem</a></li><li><a href="http://pro.gigaom.com/2010/12/9-companies-that-pushed-the-infrastructure-discussion-in-2010/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=603080+biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes&utm_content=dharrisstructure">9 Companies that Pushed the Infrastructure Discussion in 2010</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_52453933.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_52453933.jpg?w=150" medium="image">
			<media:title type="html">genome</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_67144993.jpg?w=300" medium="image">
			<media:title type="html">lab worker</media:title>
		</media:content>
	</item>
		<item>
		<title>It pays to know you: Interest graph master Gravity gets $10.6M</title>
		<link>http://gigaom.com/2012/10/02/it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m/</link>
		<comments>http://gigaom.com/2012/10/02/it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m/#comments</comments>
		<pubDate>Tue, 02 Oct 2012 16:02:32 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[Gravity]]></category>
		<category><![CDATA[Interest Graph]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[semantic analysis]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=568889</guid>
		<description><![CDATA[Interest graph specialist Gravity has raised $10.6 million to expand its business of personalizing the web for consumers. Thanks to a semantic engine that associates the content site visitors read with related topics, Gravity says it can show readers just what they want to see.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=568889&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.gravity.com/">Gravity</a>, the company whose interest graph technology powers delivery of personalized for a number of prominent web publishers, has raised a $10.6 million Series B round. The new funding comes from GRP Partners, as well existing investors Redpoint Ventures and August Capital. If personalization is the future of web content, there are worse bets to make than Gravity.</p>
<p>As I explained in March when, Gravity <a href="http://gigaom.com/cloud/the-personalized-web-is-just-an-interest-graph-away/">has built a semantic-analysis engine</a> that tries to gauge a site visitor’s interest by looking at more than the articles that person reads. Thanks to an expansive database of topics and <a href="http://gigaom.com/2012/03/11/can-big-data-fix-a-broken-system-for-software-patents/">a hybrid man-machine machine learning system</a> that takes into account behavior as well as content, Gravity can determine other topics that might be of interest even if those connections aren’t visible to the naked eye. The result of this analysis is called an interest graph, which is like a social graph only that’s concerned with interests rather than people.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/03/gravity1.jpg"><img title="gravity" src="http://gigaom2.files.wordpress.com/2012/03/gravity1.jpg?w=604&#038;h=267" alt="" width="604" height="267" class="aligncenter size-large wp-image-499888"></a></p>
<p>Currently, Gravity claims its total body of graph data exceeds <a href="http://www.gravity.com/labs/livemetrics/">18 million megabytes</a>, or 18 terabytes. The company says the new money will help it expand operations in the United States and even deploy its own content-marketing platform.</p>
<p>Of course, interest graphs are useful for more than just automatically presenting visitors with the news content they’re interested in. Gravity also has a product for advertisers to better target potential customers, and an analytics service so publishers can get in-depth visualizations of who’s reading their content and what content works better than other content.</p>
<p>However, the obvious elephant in the room when talking about interest graphs is privacy and how to collect and analyze user data without crossing any ethical guidelines. This will become even more of an issue as web platforms try to share data across services in order to create a more unified browsing experience in which interest graphs follow users around the web to inform personalization algorithms at every step. And as we’ll discuss later this month at our <a href="http://event.gigaom.com/structureeurope/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=568889+it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m&amp;utm_content=dharrisstructure">Structure: Europe</a> conference in Amsterdam, some governments take user privacy much more seriously than others, which can make businesses based on that data a little trickier to operate.</p>
<p>Here’s Gravity Co-Founder and CTO Jim Benedetto, along with privacy attorney Ashlie Beringer, discussing the issue with me at our Structure: Data conference last March.</p>
<p><iframe style="border: 0; outline: 0;" src="http://cdn.livestream.com/embed/gigaombigdata?layout=4&amp;clip=pla_8f4f26ca-053e-442f-bcc4-13d2ce2409e9&amp;height=340&amp;width=560&amp;autoplay=false" frameborder="0" scrolling="no" width="560" height="340"></iframe></p>
<div style="font-size: 11px; padding-top: 10px; text-align: center; width: 560px;"><a title="Watch gigaombigdata" href="http://www.livestream.com/gigaombigdata?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks">gigaombigdata</a> on livestream.com. <a title="Broadcast Live Free" href="http://www.livestream.com/?utm_source=lsplayer&amp;utm_medium=embed&amp;utm_campaign=footerlinks">Broadcast Live Free</a></div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=568889&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=506559"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=506559" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=568889+it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=568889+it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2012/01/newnet-q4-platform-mania-and-social-commerce-shakeout/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=568889+it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m&utm_content=dharrisstructure">NewNet Q4: Platform mania and social commerce shakeout</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=568889+it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/10/02/it-pays-to-know-you-interest-graph-master-gravity-gets-10-6m/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy-e1359742098722.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy-e1359742098722.jpeg?w=150" medium="image">
			<media:title type="html">canvas-copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/03/gravity1.jpg?w=604" medium="image">
			<media:title type="html">gravity</media:title>
		</media:content>
	</item>
		<item>
		<title>Why the days are numbered for Hadoop as we know it</title>
		<link>http://gigaom.com/2012/07/07/why-the-days-are-numbered-for-hadoop-as-we-know-it/</link>
		<comments>http://gigaom.com/2012/07/07/why-the-days-are-numbered-for-hadoop-as-we-know-it/#comments</comments>
		<pubDate>Sat, 07 Jul 2012 17:30:54 +0000</pubDate>
		<dc:creator>Mike Miller, Cloudant</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Dremel]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Pregel]]></category>
		<category><![CDATA[real-time processing]]></category>
		<category><![CDATA[Storm]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=540391</guid>
		<description><![CDATA[For better or worse, Hadoop has become synonymous with big data. In just a few years it has gone from a fringe technology to the de facto standard. But is the enterprise buying into a technology whose best day has already passed?<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=540391&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/07/elephant-walking-away.jpg"><img  title="elephant walking away" src="http://gigaom2.files.wordpress.com/2012/07/elephant-walking-away-e1341677481803.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="alignright size-medium wp-image-540408" /></a>Hadoop is everywhere. For better or worse, it has become synonymous with big data. In just a few years it has gone from a fringe technology to the de facto standard. Want to be big bata or enterprise analytics or BI-compliant?  You better play well with Hadoop.</p>
<p>It&#8217;s therefore far from controversial to say that Hadoop is firmly planted in the enterprise as the big data standard and will likely remain firmly entrenched for at least another decade. But, <a href="http://gigaom.com/cloud/democratizing-big-data-is-hadoop-our-only-hope/">building on some previous discussion</a>, I’m going to go out on a limb and ask, “Is the enterprise buying into a technology whose best day has already passed?”</p>
<h2>First, there were Google File System and Google MapReduce</h2>
<p>To study this question we need to return to Hadoop’s inspiration – Google’s MapReduce. Confronted with a data explosion, Google engineers Jeff Dean and Sanjay Ghemawat architected (and published!) two seminal systems: the <a href="http://research.google.com/archive/gfs.html">Google File System</a> (GFS) and <a href="http://research.google.com/archive/mapreduce.html">Google MapReduce</a> (GMR). The former was a brilliantly pragmatic solution to exabyte-scale data management using commodity hardware. The latter was an equally brilliant <em>implementation </em>of a long-standing design pattern applied to massively parallel processing of said data on said commodity machines.</p>
<p>GMR’s brilliance was to make big data processing approachable to Google’s typical user/developer and to make it fast and fault tolerant. Simply put, it boiled data processing at scale down to the bare essentials and took care of everything else. GFS and GMR became the core of the processing engine used to crawl, analyze, and rank web pages into the giant inverted index that we all use daily at google.com. This was clearly a major advantage for Google.</p>
<p>Enter reverse engineering in the open source world, and, voila, <a href="http://hadoop.apache.org">Apache Hadoop</a> &#8212; comprised of the Hadoop Distributed File System and Hadoop MapReduce &#8212; was born in the image of GFS and GMR. Yes, Hadoop is developing into an ecosystem of projects that touch nearly all parts of data management and processing. But, at its core, it is a MapReduce system. Your code is turned into map and reduce <em>jobs</em>, and Hadoop runs those <em>jobs</em> for you.</p>
<h2>Then Google evolved. Can Hadoop catch up?</h2>
<p>Most interesting to me, however, is that GMR no longer holds such prominence in the Google stack. Just as the enterprise is locking into MapReduce, Google seems to be moving past it. In fact, many of the technologies I’m going to discuss below aren’t even new; they date back the second half of the last decade, mere years after the seminal GMR paper was in print.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/wheel.jpg"><img  title="wheel" src="http://gigaom2.files.wordpress.com/2012/07/wheel.jpg?w=708" alt=""   class="aligncenter size-full wp-image-540411" /></a></p>
<p>Here are technologies that I hope will ultimately seed the post-Hadoop era. While many Apache projects and commercial Hadoop distributions are actively trying to address some of the issues below via technologies and features such as <a href="http://hbase.apache.org/">HBase</a>, <a href="http://hive.apache.org/">Hive</a> and <a href="http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html">Next-Generation MapReduce (aka YARN)</a>, it is my opinion that it will require new, non-MapReduce-based architectures that leverage the Hadoop core (HDFS and Zookeeper) to truly compete with Google’s technology. (A more technical exposition with published benchmarks is available at <a href="http://www.slideshare.net/mlmilleratmit/gluecon-miller-horizonhttp://">http://www.slideshare.net/mlmilleratmit/gluecon-miller-horizon</a>.)</p>
<p><strong>Percolator for incremental indexing and analysis of frequently changing datasets</strong>. Hadoop is a big machine. Once you get it up to speed it’s great at crunching your data. Get the disks spinning forward as fast as you can. However, each time you want to analyze the data (say after adding, modifying or deleting data) you have to stream over the entire dataset. If your dataset is always growing, this means your analysis time also grows without bound.</p>
<p>So, how does Google manage to make its search results increasingly real-time? By displacing GMR in favor of an incremental processing engine called <a href="[5] http://research.google.com/pubs/pub36726.html"><strong>Percolator</strong></a>. By dealing only with new, modified, or deleted documents and using secondary indices to efficiently catalog and query the resulting output, Google was able to dramatically decrease the time to value. As the authors of the Percolator paper write, ”[C]onverting the indexing system to an incremental system … reduced the average document processing latency by a factor of 100.” This means that new content on the Web could be indexed 100 times faster than possible using the MapReduce system!</p>
<p>Coming from the Large Hadron Collider (an ever-growing big data corpus), this topic is near and dear to my heart. Some datasets simply never stop growing. It is why we baked a similar approach deep into the Cloudant data layer service, it is why trigger-based processing is now available in HBase, and it is a primary reason that <a href="http://gigaom.com/cloud/twitter-to-open-source-hadoop-like-tool/">Twitter Storm is gaining momentum</a> for real-time processing of stream data.</p>
<p><strong>Dremel for ad hoc analytics</strong>. Google and the Hadoop ecosystem worked very hard to make MapReduce an approachable tool for ad hoc analyses. From <a href="http://research.google.com/archive/sawzall.html">Sawzall</a> through <a href="http://pig.apache.org/">Pig</a> and Hive, many interface layers have been built. Yet, for all of the SQL-like familiarity, they ignore one fundamental reality – MapReduce (and thereby Hadoop) is purpose-built for organized data processing (<em>jobs</em>). It is baked from the core for workflows, not ad hoc exploration.</p>
<p>In stark contrast, many BI/analytics queries are fundamentally ad hoc, interactive, low-latency analyses. Not only is writing map and reduce workflows prohibitive for many analysts, but waiting minutes for jobs to start and hours for workflows to complete is not conducive to the interactive experience. Therefore, Google invented <a href="http://research.google.com/pubs/pub36632.html"><strong>Dremel</strong></a> (now <a href="http://gigaom.com/cloud/google-opens-up-its-biq-query-data-analytics-service-to-all/">exposed as the BigQuery product</a>) as a purpose-built tool to allow analysts to scan over petabytes of data in seconds to answer ad hoc queries and, presumably, power compelling visualizations.</p>
<div id="attachment_540412" class="wp-caption aligncenter" style="width: 614px"><a href="http://gigaom2.files.wordpress.com/2012/07/big_banner.jpg"><img  title="big_banner" src="http://gigaom2.files.wordpress.com/2012/07/big_banner.jpg?w=604&#038;h=230" alt="" width="604" height="230" class="size-large wp-image-540412" /></a><p class="wp-caption-text">Google BigQuery</p></div>
<p>Google&#8217;s Dremel paper says it is “capable of running aggregation queries over trillions of rows in seconds,” and the same paper notes that running identical queries in standard MapReduce is approximately 100 times slower than in Dremel. Most impressive, however, is real world data from production systems at Google, where the vast majority of Dremel queries complete in less than 10 seconds, a time well below the typical latencies of even beginning execution of a MapReduce workflow and its associated jobs.</p>
<p>Interestingly, I’m not aware of any compelling open source alternatives to Dremel at the time of this writing and consider this a fantastic BI/analytics opportunity.</p>
<p><strong>Pregel for analyzing graph data</strong>. Google MapReduce was purpose-built for crawling and analyzing the world’s largest graph data structure – the internet. However, certain core assumptions of MapReduce are at fundamental odds with analyzing networks of people, telecommunications equipment, documents and other graph data structures. For example, calculation of the single-source shortest path (SSSP) through a graph requires copying the graph forward to future MapReduce passes, an amazingly inefficient approach and simply untenable at scale.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/bigdata_goldenorb-graph-1.jpeg"><img  title="bigdata_goldenorb-graph (1)" src="http://gigaom2.files.wordpress.com/2012/07/bigdata_goldenorb-graph-1.jpeg?w=300&#038;h=156" alt="" width="300" height="156" class="alignleft size-medium wp-image-540413" /></a>Therefore, Google built <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html"><strong>Pregel</strong></a>, a large bulk synchronous processing application for petabyte -scale graph processing on distributed commodity machines. The results are impressive. In contrast to Hadoop, which often causes exponential data amplification in graph processing, Pregel is able to naturally and efficiently execute graph algorithms such as SSSP or PageRank in dramatically shorter time and with significantly less complicated code. Most stunning is the published data demonstrating processing on billions of nodes with trillions of edges in mere minutes, with a near linear scaling of execution time with graph size.</p>
<p>At the time of writing, the only viable option in the open source world is <a href="http://giraph.apache.org/">Giraph</a>, an early Apache incubator project that leverages HDFS and Zookeeper. There&#8217;s another project called <a href="http://goldenorbos.org/">Golden Orb</a> available on GitHub.</p>
<p>In summary, Hadoop is an incredible tool for large-scale data processing on clusters of commodity hardware. But if you’re trying to process dynamic data sets, ad-hoc analytics or graph data structures, Google’s own actions clearly demonstrate better alternatives to the MapReduce paradigm. Percolator, Dremel and Pregel make an impressive trio and comprise the new canon of big data. I would be shocked if they don’t have a similar impact on IT as Google’s original big three of GFS, GMR, and BigTable have had.</p>
<p><em>Mike Miller (<a href="https://twitter.com/mlmilleratmit">@mlmilleratmit</a>) is chief scientist and co-founder at Cloudant, and Affiliate Professor of Particle Physics at University of Washington.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-375532p1.html">Shutterstock user Jason Prince</a>; evolution of the wheel image courtesy of <a href="http://www.shutterstock.com/gallery-66151p1.html">Shutterstock user James Steidl</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=540391&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=386388"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=386388" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=540391+why-the-days-are-numbered-for-hadoop-as-we-know-it&utm_content=gigaguest">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=540391+why-the-days-are-numbered-for-hadoop-as-we-know-it&utm_content=gigaguest">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=540391+why-the-days-are-numbered-for-hadoop-as-we-know-it&utm_content=gigaguest">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=540391+why-the-days-are-numbered-for-hadoop-as-we-know-it&utm_content=gigaguest">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/07/why-the-days-are-numbered-for-hadoop-as-we-know-it/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/07/elephant-walking-away-e1341677481803.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/07/elephant-walking-away-e1341677481803.jpg?w=150" medium="image">
			<media:title type="html">elephant walking away</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4411542bbd7a2a9a2fc2a1b38809e45c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaguest</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/elephant-walking-away-e1341677481803.jpg?w=300" medium="image">
			<media:title type="html">elephant walking away</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/wheel.jpg" medium="image">
			<media:title type="html">wheel</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/big_banner.jpg?w=604" medium="image">
			<media:title type="html">big_banner</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/bigdata_goldenorb-graph-1.jpeg?w=300" medium="image">
			<media:title type="html">bigdata_goldenorb-graph (1)</media:title>
		</media:content>
	</item>
		<item>
		<title>Neo raises $10.6M for Neo4j as graph DBs take off</title>
		<link>http://gigaom.com/2011/09/21/neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off/</link>
		<comments>http://gigaom.com/2011/09/21/neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off/#comments</comments>
		<pubDate>Wed, 21 Sep 2011 07:06:05 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[FlockDB]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[InfiniteGraph]]></category>
		<category><![CDATA[Neo Technology]]></category>
		<category><![CDATA[Neo4j]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Ravel Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=408895</guid>
		<description><![CDATA[Graph databases are a pretty specialized product -- but as NoSQL keeps gaining mainstream acceptance, they seem to be catching on, and the latest evidence comes in the form of a $10.6 million funding found for Silicon Valley firm Neo Technology.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=408895&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/09/neo.jpg"><img  title="neo" src="http://gigaom2.files.wordpress.com/2011/09/neo.jpg?w=708" alt=""   class="alignleft size-full wp-image-408908" /></a>Graph databases are among the more specialized databases around, but as NoSQL keeps <a href="http://gigaom.com/cloud/why-accentures-cto-made-the-move-to-nosql-startup-ceo/">gaining mainstream acceptance</a>, they seem to be finding a place in the greater IT consciousness. The latest evidence comes in the form of <a href="http://neotechnology.com">Neo Technology</a>&#8216;s $10.6 million funding round for its support of the Neo4j graph database.</p>
<p><a href="http://gigaom2.files.wordpress.com/2011/09/neo4j1.jpg"><img  title="neo4j1" src="http://gigaom2.files.wordpress.com/2011/09/neo4j1.jpg?w=279&#038;h=300" alt="" width="279" height="300" class="alignright size-medium wp-image-408909" /></a>Technically, the &#8220;NoSQL&#8221; label applies to a broad collection of databases that do a wide variety of things, with a single unifying characteristic: that they&#8217;re not SQL. Although document-oriented systems (such as MongoDB and CouchDB) and key-value stores (such as Cassandra) get much of the attention because they address common issues around scale, speed and flexibility for web companies and even large enterprises, graph databases are finding their own voice.</p>
<p>As I <a href="http://gigaom.com/cloud/ravel-open-sources-tool-for-analyzing-graph-data-like-google/">explained previously while covering GoldenOrb</a>, another graph database:</p>
<blockquote><p>Essentially, graph databases excel at finding relationships between disparate pieces of data, with one major use case being social graphs. They run analyses over terabytes of graph data while maintaining the relationships between the data, even as the data and the relationships constantly evolve.</p></blockquote>
<p>Aside from GoldenOrb, an open source product from Austin, Texas-based startup Ravel Data, we&#8217;ve also recently seen database veteran Objectivity <a href="http://gigaom.com/cloud/twitters-success-pulls-23-year-old-objectivity-into-nosql/">work its way into the NoSQL space</a> with its InfiniteGraph product. </p>
<p>Graph databases have actually been around for a while, but they&#8217;ve gained some popularity recently because of their use with large web properties such as Google and Twitter. Google built its own <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">called Pregel</a>, and Twitter built one <a href="http://gigaom.com/cloud/nosql-is-for-the-birds/">called FlockDB</a>.</p>
<p>For its part, Neo4j is an <a href="http://neo4j.org/">open source database</a> for which the Menlo Park, Calif.-based Neo Technology offers support, services and commercial licenses. Fidelity Growth Partners led this funding round for Neo &#8212; its first &#8212; along with seed investors Sunstone Capital and Conor Venture Partners.</p>
<p><em>Images courtesy of Neo4j.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=408895&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=930657"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=930657" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=408895+neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=408895+neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=408895+neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=408895+neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/09/21/neo-raises-10-6m-for-neo4j-as-graph-dbs-take-off/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/09/neo4j2-e1316588497562.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/09/neo4j2-e1316588497562.jpg?w=150" medium="image">
			<media:title type="html">neo4j2</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/09/neo.jpg" medium="image">
			<media:title type="html">neo</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/09/neo4j1.jpg?w=279" medium="image">
			<media:title type="html">neo4j1</media:title>
		</media:content>
	</item>
		<item>
		<title>Ravel open-sources tool for analyzing graph data like Google</title>
		<link>http://gigaom.com/2011/06/27/ravel-open-sources-tool-for-analyzing-graph-data-like-google/</link>
		<comments>http://gigaom.com/2011/06/27/ravel-open-sources-tool-for-analyzing-graph-data-like-google/#comments</comments>
		<pubDate>Tue, 28 Jun 2011 04:01:54 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[@NYT]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[GoldenOrg]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Pregel]]></category>
		<category><![CDATA[Ravel Data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=368259</guid>
		<description><![CDATA[Ravel now offers an open-source graph database that looks to bring the benefit's of Google's Pregel project to the masses. Graph databases don't get the attention of other big-data technologies such as Hadoop or NoSQL, but every Twitter user is familiar with what they can do.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=368259&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/06/bigdata_goldenorb-graph.jpg"><img  title="BigData_GoldenOrb-graph" src="http://gigaom2.files.wordpress.com/2011/06/bigdata_goldenorb-graph.jpg?w=300&#038;h=156" alt="" width="300" height="156" class="alignleft size-medium wp-image-368371" /></a>Austin, Texas-based startup Ravel has released <a href="http://www.goldenorbos.org/">GoldenOrb</a>, an open-source graph database that looks to bring the benefit&#8217;s of Google&#8217;s Pregel project to the masses. Graph databases don&#8217;t get the attention of other big-data technologies such as Hadoop or NoSQL, but every Twitter user is familiar with the result of what graph databases can do.</p>
<p>Essentially, graph databases excel at finding relationships between disparate pieces of data, with one major use case being social graphs. They run analyses over terabytes of graph data while maintaining the relationships between the data, even as the data and the relationships constantly evolve.</p>
<p>Twitter actually <a href="http://gigaom.com/2010/04/12/twitter-open-sources-the-home-of-its-social-graph/">created its own graph database, called FlockDB</a>, to help the site determine who&#8217;s connected to whom in the Twittersphere. Google uses Pregel to power its PageRank feature, although as it <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">explained in a 2009 blog post introducing the technology</a>, there are many other possibilities:</p>
<blockquote><p>If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. &#8230;</p>
<p>A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs.</p></blockquote>
<p>In spreading the word about GoldenOrb, Ravel expands upon the use cases, citing marketing analysis, pharmaceutical research and, essentially, any situation in which it would be beneficial to &#8220;run traditional analytics on entire data sets instead of only small samples &#8230; .&#8221;</p>
<p>A couple of things make GoldenOrb particularly worth watching: 1) it&#8217;s both an open source and a product, which distinguishes it from Twitter&#8217;s open-source FlockDB project, Google&#8217;s proprietary Pregel project and <a href="http://gigaom.com/cloud/twitters-success-pulls-23-year-old-objectivity-into-nosql/">Objectivity&#8217;s proprietary InfiniteGraph</a> product; and 2) it&#8217;s based on Hadoop. Having an actual product to work on instead of just code could garner a large community, especially from the growing ranks of Hadoop developers.</p>
<p>Hadoop and NoSQL databases have both ridden the big data wave to form robust development communities, so why can&#8217;t graph databases be next? The GoldenOrb code is available at <a href="https://github.com/raveldata/goldenorb">https://github.com/raveldata/goldenorb</a>.</p>
<p>For more information about Ravel from the horse&#8217;s mouth, including plans to create an enterprise version of Apache&#8217;s Hadoop-based Mahout machine-learning platform, check out this video interview with Ravel president Zach Richardson:</p>
<div class="flex-video"><div id="ooyala-video_da76119e4dd8d75952ed338f969c6ce2" class="video-player ooyala-video" width="600" height="338"><p>
			<a href="http://gigaom.com/2011/06/27/ravel-open-sources-tool-for-analyzing-graph-data-like-google/"><img src="http://ak.c.ooyala.com/NpODVjMjrXM_8GK16XmZF6Pne9d8sKop/R9h3a3wTes9kt5iH5iMDoxOm9pO9a5tR" alt="Ooyala Video Thumbnail" /></a><br />
			<a href="http://gigaom.com/2011/06/27/ravel-open-sources-tool-for-analyzing-graph-data-like-google/">Watch this video for free</a> on <a href='http://gigaom.com/'>GigaOM</a>
		</p></div></div>
<p><em>Image courtesy of Ravel.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=368259&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=852131"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=852131" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=368259+ravel-open-sources-tool-for-analyzing-graph-data-like-google&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=368259+ravel-open-sources-tool-for-analyzing-graph-data-like-google&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=368259+ravel-open-sources-tool-for-analyzing-graph-data-like-google&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=368259+ravel-open-sources-tool-for-analyzing-graph-data-like-google&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2011/06/27/ravel-open-sources-tool-for-analyzing-graph-data-like-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/06/bigdata_goldenorb-graph.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/06/bigdata_goldenorb-graph.jpg?w=150" medium="image">
			<media:title type="html">BigData_GoldenOrb-graph</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/06/bigdata_goldenorb-graph.jpg?w=300" medium="image">
			<media:title type="html">BigData_GoldenOrb-graph</media:title>
		</media:content>
	</item>
	</channel>
</rss>
