<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; open source</title>
	<atom:link href="http://gigaom.com/tag/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Fri, 24 May 2013 09:22:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; open source</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>WibiData gets $15M to help it become the Hadoop application company</title>
		<link>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/</link>
		<comments>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/#comments</comments>
		<pubDate>Thu, 23 May 2013 11:31:17 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[OPower]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=648663</guid>
		<description><![CDATA[Startup WibiData has raised another $15 million and wants to turn the lessons it has learned in the field into generic software that can let anyone build predictive applications on Hadoop.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648663&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.wibidata.com/">WibiData</a> &#8212; the big data startup from Cloudera Co-founder Christophe Bisciglia and Aaron Kimball &#8212; doesn&#8217;t have <em>overly</em> big plans. It only wants to become one of the first, if not the first, company selling off-the-shelf software that lets other companies build valuable, customer-facing applications on Hadoop. On Thursday, WibiData announced $15 million in Series B funding from Canaan Partners, as well as existing investors NEA and Google Chairman Eric Schmidt, to help make the goal a reality. </p>
<p>Kidding aside, that&#8217;s actually quite an ambitious goal in a Hadoop market that&#8217;s big and growing, but that&#8217;s exemplified by expensive consulting arrangements and purpose-built applications. Even more so for companies that want to do something other than transforming unstructured data into structured data (often called ETL) or run back-office analytics jobs. In fact, WibiData has spent the last 18 months doing just this type of deal, and Bisciglia says every single customer has already engaged with one of the big three Hadoop vendors (Cloudera, Hortonworks and MapR). </p>
<p>Home energy-management startup <a href="http://gigaom.com/2012/11/19/opower-the-big-data-energy-player-to-beat/">Opower</a> is a good example of this process. It&#8217;s actually one of Cloudera&#8217;s banner customers, but &#8220;when they wanted to take [their software-as-a-service tool] beyond batch analysis and ETL workloads,&#8221; Bisciglia said, Opower came to WibiData. So whereas the Opower service was originally focused on nightly data analysis comparing users&#8217; energy usage against that of other users, it&#8217;s now working on dynamic recommendations for users and letting them engage with the application in new ways.</p>
<div id="attachment_648685" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg"><img  alt="The WibiData architecture" src="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg?w=300&#038;h=224" width="300" height="224" class="size-medium wp-image-648685" /></a><p class="wp-caption-text">The WibiData architecture</p></div>
<p>During these engagements, WibiData <a href="http://gigaom.com/2012/03/22/wibidata-structure-data-2012/">has been building up its core technology</a> for connecting those brawny back-office Hadoop environments to predictive customer-facing applications &#8211; a collection of HBase, data-formatting tools and machine learning algorithms that the company <a href="http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/">has been slowly open-sourcing under the Kiji banner</a>. It has also been learning the similarities among the applications it&#8217;s building for customers in the same field, figuring out what&#8217;s repeatable. What does any given company in the retail space, for example, need to get started on <a href="http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/">its own recommendation engine</a>? </p>
<p>And now, Bisciglia says, WibiData is going to double down on building application software based on what it has learned. The first two industries it targets will likely be financial services and retail, two areas where the company has seen a lot of traction. He envisions the finished product including some pre-defined schema for formatting data and some pre-built predictive models, both broadly applicable across that industry rather than specific to a single user. </p>
<p>There will also be different interfaces that allow different types of users (e.g., data scientists, systems engineers and business users) to interact with the data in the ways they need to. </p>
<p>Time will tell if WibiData can actually accomplish its goal of turning Hadoop into a collection of somewhat specialized software packages, but someone has to. Even industry heavyweights like Cloudera see the need, but their hands are full just getting Hadoop integrated into existing environments and getting those early uses up and running. As Cloudera CEO Mike Olson <a href="http://gigaom.com/2012/03/21/cloudera-structure-data-2012/">said at Structure: Data in 2012</a> to anyone ambitious enough to tackle the Hadoop-application gap, &#8220;Call me, I’ll connect you with funding. The money is out there.&#8221; </p>
<p>If you want to hear more about the need for Hadoop applications, check out this panel from Structure: Data 2013, where I speak with WibiData&#8217;s Omer Trajman, Continuuity&#8217;s Jonathan Gray and Pivotal&#8217;s Muddu Sudhakar. <span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/z7BhGEQX9BQ?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648663&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=233939"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=233939" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/12/why-the-big-data-startup-boom-will-likely-be-short-lived/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Why the big data startup boom will likely be short-lived</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/wibi-founders.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/wibi-founders.png?w=150" medium="image">
			<media:title type="html">wibi founders</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg?w=300" medium="image">
			<media:title type="html">The WibiData architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>Concurrent is building a Hadoop assembly line in open source</title>
		<link>http://gigaom.com/2013/05/22/concurrent-is-building-a-hadoop-assembly-line-in-open-source/</link>
		<comments>http://gigaom.com/2013/05/22/concurrent-is-building-a-hadoop-assembly-line-in-open-source/#comments</comments>
		<pubDate>Wed, 22 May 2013 19:21:16 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cascading]]></category>
		<category><![CDATA[Concurrent]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Lingual]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Pattern]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[statistical analysis]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=648186</guid>
		<description><![CDATA[Cascading creator Concurrent has developed a new open source tool called Pattern for running machine learning models on Hadoop clusters. When combined with its SQL tool called Lingual, users can move data from one stage to another easily.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648186&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you know Java, R or SAS, doing machine learning on Hadoop data just got a lot easier. <a href="http://www.concurrentinc.com/">Concurrent</a> <em>(</em><em>see disclosure)</em>, the company behind the popular <a href="http://www.cascading.org/">Cascading</a> framework for writing big data jobs, has developed a new open source tool called <a href="http://www.cascading.org/pattern/">Pattern</a> that lets users export their models from statistical analysis applications and run THEM? at scale on Hadoop data with little to no code change.</p>
<p>The reason for creating Pattern is pretty simple, according to Concurrent founder and CTO Chris Wensel: &#8220;Hadoop is never used alone.&#8221; It&#8217;s always part of a data environment that also includes databases, visualization tools, analytics software and/or statistical analysis tools that arguably do the really valuable work. Hadoop&#8217;s real value is an integration platform that can feed data into these other systems and, ideally, put their outputs to work across much larger datasets.</p>
<p>Developers <em>can</em> use the Pattern Java API to create machine learning jobs, but they can also simply export a Predictive Model Markup Language (PMML) file from software like R, SAS and MicroStrategy that Pattern will read and run them as a Cascading workflow. Models are useless unless you can run them in production, Wensel said, and Pattern lets them run across more data, stored in Hadoop, than you can use to build them with those other tools.</p>
<p>However, Wensel noted, &#8220;The real takeaway isn&#8217;t Pattern itself.&#8221;</p>
<p>From his perspective, the real story is Pattern plus Cascading plus <a href="http://www.cascading.org/lingual/">Lingual</a>, the open source SQL-to-Hadoop tool that Concurrent recently developed and released. Lingual is the tie that binds everything together, creating a sort of assembly line for data as it works its way from generation to delivering some value. For example, someone might create a Cascading job that adds structure to incoming data, and then pull some of the data into R using Lingual. Once a model is created in R and exported to the Hadoop cluster using Pattern, Lingual can feed the MapReduce output file back to R so a data scientist can test the model&#8217;s accuracy.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/arch-diagram.png"><img  alt="arch-diagram" src="http://gigaom2.files.wordpress.com/2013/05/arch-diagram.png?w=708"   class="aligncenter size-full wp-image-648347" /></a></p>
<p>And actually, Wensel said, Lingual could have a positive effect on companies&#8217; bottom lines. Airbnb recently replaced a departed engineer with Lingual for monthly migrations of data from Hadoop and into SQL environments. Climate Corporation, <a href="http://gigaom.com/2012/05/02/how-climate-corp-is-pitting-big-data-against-mother-nature/">a massive Hadoop and Cascading user</a>, could use Lingual to let its crop-and-weather insurance customers access their data from the company&#8217;s Hadoop store.</p>
<p>Lingual and Pattern should help Concurrent finally make some money, too. Both of them, as well as the Cascading framework that underpins them, will always be open source, Wensel said, but it plans to create &#8220;a suite of products that will make your life much better if &#8230; you standardize on Cascading.&#8221;</p>
<p>For example, the company has the ability to monitor jobs at the application level rather than the cluster level, meaning it can tell you the details of that job that&#8217;s locking up all the resources and whether you really want to kill it (it might be an important report for the CFO &#8230;). &#8220;We can do some really interesting things,&#8221; Wensel said.</p>
<p><em><strong>Disclosure</strong>: Concurrent is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.</em></p>
<p><em>This post was updated at 2:48pm PT to correct Chris Wensel&#8217;s title. He is CTO.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-908242p1.html">Shutterstock user PENGYOU91</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648186&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=296650"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=296650" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/22/concurrent-is-building-a-hadoop-assembly-line-in-open-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_98915513.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/shutterstock_98915513.jpg?w=150" medium="image">
			<media:title type="html">assembly line</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/arch-diagram.png" medium="image">
			<media:title type="html">arch-diagram</media:title>
		</media:content>
	</item>
		<item>
		<title>We&#8217;re witnessing the rise of the graph in big data</title>
		<link>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/</link>
		<comments>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/#comments</comments>
		<pubDate>Tue, 14 May 2013 14:33:33 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[GraphLab]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=645059</guid>
		<description><![CDATA[Graph databases and graph-processing applications have been popping up all over the place lately, and now they're starting to go commercial. On Tuesday, popular open source project GraphLab joined the ranks of graph startups.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>GraphLab, a popular <a href="http://graphlab.org/">open source project</a> dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, <a href="http://graphlab.com/">GraphLab Inc.</a> GraphLab creator &#8212; and University of Washington machine learning professor &#8212; Carlos Guestrin will lead the new Seattle-based company, which has raised $6.75 million from Madrona Venture Group and NEA.</p>
<p>Graph analysis is among the hottest techniques around for making sense of large datasets, primarily by determining how tightly different data points are related or how similar they are. The term &#8220;graph&#8221; came into the broader lexicon along with social networks, which built social graphs to <a href="http://gigaom.com/2013/03/14/facebook-tweaks-its-algorithms-to-improve-graph-search-comment-search-coming/">assess the relationships among their millions of users</a>, but the technique has much broader uses.</p>
<div id="attachment_645089" class="wp-caption aligncenter" style="width: 677px"><a href="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg"><img  alt="My LinkedIn social graph" src="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg?w=708"   class="size-full wp-image-645089" /></a><p class="wp-caption-text">My LinkedIn social graph</p></div>
<p>Guestrin said GraphLab&#8217;s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks as potential applications. We&#8217;ve covered graphs as the analytical model of choice for everything <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">from content recommendation</a> to <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">tracking lab work in genomics</a>. Really, though &#8212; especially when combined with machine learning &#8212; graph analysis <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">can be applied to anything</a> where there&#8217;s too much data for a person to possibly analyze the relationships between every point.</p>
<div id="attachment_601469" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg"><img  alt="One of Ayasdi's graph-like data maps" src="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708&#038;h=472" width="708" height="472" class="size-large wp-image-601469" /></a><p class="wp-caption-text">One of Ayasdi&#8217;s graph-like data maps</p></div>
<p>Google also famously uses <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">a graph-processing system called Pregel</a> as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open-source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project&#8217;s user base.</p>
<p>Among those other projects are graph databases such as <a href="http://giraph.apache.org/">Giraph</a> (an open source, Hadoop-based Pregel clone developed at Facebook) and <a href="http://www.neo4j.org/">Neo4j</a> (which also has a commercial arm, <a href="http://gigaom.com/2012/11/02/graph-startup-neo-raises-11m-as-specialized-databases-take-hold/">called Neo Technology</a>), as well as <a href="http://engineering.twitter.com/2012/03/cassovary-big-graph-processing-library.html">Twitter&#8217;s Cassovary</a> and fellow University of Washington project <a href="http://www.cs.washington.edu/node/4217/">Grappa</a>. Guestrin said GraphLab can work with most of them, particularly if they&#8217;re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.</p>
<p>As for when we&#8217;ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he&#8217;s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.</p>
<p>The bigger question to come out of all this graph activity, though, is how big a market we&#8217;ll ultimately see for graph-analysis or any other specific technique. As companies get more comfortable with big data from a technical standpoint, they&#8217;re getting more interested in the different types of analysis it allows for too. This is evidenced by the <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">quest to make Hadoop support myriad processing frameworks</a> aside from MapReduce.</p>
<p>We already have a handful of commercial graph products on the market &#8212; including an industrial grade one called <a href="http://www.yarcdata.com/">YarcData</a> from supercomputer maker Cray &#8212; but how many will there eventually be? And if graph analysis is all the rage right now, what comes next?</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=645059&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=268928"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=268928" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/11/connected-world-the-consumer-technology-revolution/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=645059+were-witnessing-the-rise-of-the-graph-in-big-data&utm_content=dharrisstructure">Connected world: the consumer technology revolution</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/14/were-witnessing-the-rise-of-the-graph-in-big-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/graphics2-3_final_cartoon.jpg?w=150" medium="image">
			<media:title type="html">graphics2-3_final_cartoon</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/lnkdmap-1.jpg" medium="image">
			<media:title type="html">My LinkedIn social graph</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/ayasdi-product-image-2-e1358295341371.jpg?w=708" medium="image">
			<media:title type="html">One of Ayasdi&#039;s graph-like data maps</media:title>
		</media:content>
	</item>
		<item>
		<title>The promise of better data has MetLife investing $300M in new tech</title>
		<link>http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/</link>
		<comments>http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/#comments</comments>
		<pubDate>Tue, 07 May 2013 14:00:31 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[10Gen]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[insurance industry]]></category>
		<category><![CDATA[MetLife]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642824</guid>
		<description><![CDATA[MetLife is building new products on new technologies thanks to a $300 million investment in new technology and new skills. One of the first products is a MongoDB-based app that puts all of customers' information in one place.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642824&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The insurance industry hasn&#8217;t always been a beacon of technological innovation. Then again, its major providers haven&#8217;t always earmarked $300 million for investments in new technology and new talent like MetLife has. The strategy has already borne its first fruit in the form of a new database system and application that lets the company see everything it knows about a customer in a single place.</p>
<p>The new application, called The Wall, is essentially a way to make the customer service experience more palatable for consumers and to lower the burden of hiring new representatives. Because it&#8217;s designed to look and function like Facebook, MetLife CIO and SVP of Regional Application Development Gary Hoberman told me, The Wall means new hires don&#8217;t have to be trained on complex enterprise call center software. For customers calling MetLife to discuss a claim or their coverage, it means fewer annoying waits as an agent accesses data from any of dozens of different places.</p>
<p>&#8220;Instead of seeing what someone had for dinner, [The Wall is] all a customer&#8217;s transactions,&#8221; Hoberman said. Claims, records, status, possible cross-sell information (e.g., if someone lives in an apartment and might need renter&#8217;s insurance) &#8212; it&#8217;s all in there. Looking forward, he said, it might even contain other publicly available information from social media and certain mobile apps that would give the company even greater visibility into its customers&#8217; lives.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract.png"><img  alt="MetLife Screen Shot_Active Contract" src="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract.png?w=708&#038;h=720" width="708" height="720" class="aligncenter size-large wp-image-642871" /></a></p>
<h2 id="up-and-running-in-3-months-on-">Up and running in 3 months, on MongoDB</h2>
<p>From a business perspective, though, the most-impressive part of The Wall is how quickly it was implemented and what a divergence from classic large-enterprise IT practices it represents. For Hoberman, who spent 16 years at Citi before joining MetLife in mid-2012, the process was eye-opening. If you told someone in the financial services industry that it would take just five days to get servers up and running for the prototype of such a big application, he said, &#8220;they&#8217;d look at you like you had two heads.&#8221;</p>
<p>But that&#8217;s exactly what MetLife did. In fact, it had the entire prototype built just two weeks after devising it and the production system up and running in just three months. It came together so fast because of MetLife&#8217;s new focus on cutting-edge IT and clear mission to build a useful product rather than, as Hoberman put it, &#8220;doing big data for big data&#8217;s sake.&#8221; The tech team was willingly working nights and weekends and the leadership team was directly involved because everyone understood what a fundamental change the application could have on the business.</p>
<p>&#8220;In insurance,&#8221; Hoberman said, &#8220;&#8230; working in months, not years, is really a startup mentality.&#8221;</p>
<p>How big an undertaking was it? Built atop MongoDB, The Wall brings together data from more than 70 legacy systems and merges it into a single record. It runs across six servers in two data centers and presently stores about 24 terabytes of data. That includes MetLife&#8217;s entire U.S. customer base (some 45 million agreements in total), although the goal is to expand it to international customers and multiple languages, as well, and maybe even create a customer-facing version. It updates in near real time, just like the Facebook wall, as new customer data is entered.</p>
<p>Building a production database system on NoSQL technology isn&#8217;t commonplace in insurance or other large industries, but it was about the only way to pull this off. Going with the relational model, Hoberman explained, would have meant figuring out a common set of schema across such a wide range of products (insurance products and terms vary from state to state and country to country) that it would have been nearly impossible to actually achieve that coveted 360-degree customer view. MongoDB let Hoberman&#8217;s team build some light schema to give the app order, but to be able to take in all the data it had available.</p>
<h2 id="bringing-in-new-tech-and-new-b">Bringing in new tech, and new blood</h2>
<p>This is only a part of what MetLife is doing with new information technologies, though, and only a fraction of what it wants to do. With The Wall, specifically, MetLife Hoberman wants to build next-best action models that will give agents guidance on how to best deal with customers. Elsewhere, the company has already used its new centralized MongoDB system to build models for predicting attrition, and it&#8217;s using Hadoop and HBase for some other workloads where they&#8217;re a better fit.</p>
<p>It&#8217;s all thanks to a company mandate to save $450 million from its bloated technology and operations budget and then invest two-thirds of it back into new technology. &#8220;We literally have a $300 million investment to decide what&#8217;s going to be the future of MetLife,&#8221; Hoberman said. It&#8217;s kind of like being in a startup, he added, only with the resources to make sure everything is done right (much <a href="http://gigaom.com/2012/09/16/how-disney-built-a-big-data-platform-on-a-startup-budget/">like with other large enterprises embracing open source</a>, Hoberman&#8217;s team prototyped The Wall using open source MongoDB but brought in <a href="http://gigaom.com/2013/04/09/mongodb-ftw-fast-growing-10gen-hires-first-cfo/">10gen</a> when it came time to build a production system).</p>
<p>It might be easy to mock that statement, except that Hoberman and his peers are putting their money where their mouths are by bringing in new talent, as well. It&#8217;s setting up a team in the Research Triangle region of North Carolina and bringing in employees with expertise in areas such as social, mobile and big data. And Hoberman is far less concerned with specific technical skills than he is with motivation.</p>
<p>It&#8217;s all about &#8220;attitude and aptitude,&#8221; he said. &#8220;They can learn anything.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642824&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=173971"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=173971" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642824+with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/07/with-300m-earmarked-for-tech-innovation-metlife-wants-to-remake-insurance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract1-e1367933585875.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract1-e1367933585875.png?w=150" medium="image">
			<media:title type="html">MetLife Screen Shot_Active Contract</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/metlife-screen-shot_active-contract.png?w=708" medium="image">
			<media:title type="html">MetLife Screen Shot_Active Contract</media:title>
		</media:content>
	</item>
		<item>
		<title>MapR releases M7, its commercial HBase distro</title>
		<link>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/</link>
		<comments>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/#comments</comments>
		<pubDate>Wed, 01 May 2013 23:21:07 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=641425</guid>
		<description><![CDATA[MapR on Wednesday released its commercial version of HBase called M7, the first such product on the market, that the company claims is bigger, faster and better than the open source version.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>MapR didn&#8217;t miss the memo about the key to success in the Hadoop space being the creation of a data platform that can do many things. And on Wednesday, the company released its take on HBase, <a href="http://www.mapr.com/products/mapr-editions/m7-edition">called M7.</a></p>
<p>Last week, I <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">explained how HBase is fast becoming the star of the Hadoop ecosystem</a> because it allows users to build more real-time, almost transactional applications on top of Hadoop. True to its form with its other products, MapR has taken HBase even further with M7 by promising greater availability (99.999 percent), instant recovery, faster operations and the ability to handle 1 trillion tables in a single cluster. In open source versions of HBase, MapR VP of Marketing Jack Norris told me, the accepted table limit per cluster is several hundred.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/m7.jpg"><img  alt="m7" src="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300&#038;h=265" width="300" height="265" class="alignright size-medium wp-image-641471" /></a>Additionally, M7 shares a single data layer with the Hadoop file system, meaning less performance overhead and, presumably, easier management.</p>
<p>As we&#8217;re seeing with other Hadoop vendors, including Cloudera (which <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">released its Impala SQL query engine on Tuesday</a>), the Hadoop market is fast becoming one where each vendor is trying to set itself apart from the rest by building the best platform with the broadest set of capabilities. In furtherance of that mission, MapR also announced on Wednesday full-text search on its Hadoop distribution thanks to a partnership with Lucene specialist LucidWorks. It already has its own Hadoop distribution complete with proprietary code to bolster the file system and speed up MapReduce, as well as an <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">open source SQL-on-Hadoop project called Drill</a> in the works.</p>
<p>MapR employees are probably sleeping a lot easier these days as a result of this platform push. Others in the Hadoop market used to talk about the fear of fragmentation and then point at MapR as the example of a company helping foment that outcome with its proprietary software. Now, however, even if everyone else is building open source products, they&#8217;re all still backing their own and largely dismissing the others.</p>
<p>I suspect the result is feature lock-in even there&#8217;s no technological lock-in, kind of <a href="http://gigaom.com/2011/03/16/how-amazon-is-following-apples-lead-to-rule-cloud-computing/">like using Amazon Web Services for cloud computing</a> and then hoping to replicate its various servies elsewhere. It might be easy enough to move your data, but impossible or very difficult to replicate those additional capabilities elsewhere. If MapR can build a better version of HBase and companies are willing to pay for it, then so be it.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=659258"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=659258" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Big data 2013: key trends and companies to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" medium="image">
			<media:title type="html">Database rows</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300" medium="image">
			<media:title type="html">m7</media:title>
		</media:content>
	</item>
		<item>
		<title>With Impala now GA, Cloudera&#8217;s CEO sizes up the SQL-on-Hadoop market</title>
		<link>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/</link>
		<comments>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640777</guid>
		<description><![CDATA[Cloudera's Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop. And if CEO Mike Olson&#8217;s comments are any indication, we&#8217;re in for a long ride of competitive jockeying and oneupmanship as Cloudera and its peers go all Microsoft or Google and create myriad new data-processing engines to turn their Hadoop distributions into bona fide platforms.</p>
<p>Launched as a private beta in May 2012 and <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">made public in October</a>, Impala is Cloudera&#8217;s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It&#8217;s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.</p>
<div id="attachment_640848" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg"><img  alt="How Impala fits in" src="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300&#038;h=257" width="300" height="257" class="size-medium wp-image-640848" /></a><p class="wp-caption-text">How Impala fits in</p></div>
<p>Impala actually uses the same &#8220;nearly ANSI&#8221; version of SQL as does current standard bearer Hive, but that technology (created by Facebook in 2009 as a data warehouse layer for Hadoop) doesn&#8217;t run nearly fast enough to sate many users&#8217; desire for interactive analytics. This is because Hive transforms SQL queries into MapReduce jobs, meaning every one is processed against the entire corpus of data in the Hadoop Distributed File System.</p>
<h2 id="sizing-up-the-competition">Sizing up the competition</h2>
<p>Only Cloudera isn&#8217;t the first to have the idea, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">nor is it alone in trying to sell interactive SQL on Hadoop</a>. The idea was <a href="http://gigaom.com/2011/10/21/hadapt-raises-9-5m-for-hadoop-data-warehouse/">first commercialized by Boston-based startup Hadapt</a> in 2011, and is now being pushed by numerous startups and larger Hadoop players. Among them: Pivotal (formerly EMC) Greenplum, MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), Hortonworks (with <a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger</a>), Drawn to Scale, Splice Machine, Jethro Data and Citus Data.</p>
<div id="attachment_640858" class="wp-caption aligncenter" style="width: 600px"><a href="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg"><img  alt="Hadapt's architecture" src="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg?w=708"   class="size-full wp-image-640858" /></a><p class="wp-caption-text">Hadapt&#8217;s architecture</p></div>
<p>But Cloudera is arguably the biggest name pushing SQL on Hadoop, and CEO Mike Olson thinks Impala stands out for several reasons &#8212; not the least of which is that it exists as a product. &#8220;Nobody else is shipping production-grade SQL query support on Hadoop,&#8221; he told me during a recent call. &#8220;At least not in open source.&#8221; He seems content to let the startups do their things, instead focusing his attention on Cloudera&#8217;s big three Hadoop-distribution competitors in Pivotal, MapR and Hortonworks. Greenplum and Pivotal SVP Scott Yara <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">was full of confidence &#8212; and R&amp;D budget</a>&#8211; when the company announced the Pivotal HD distribution and HAWQ technology in February, but Olson claims the approach requires a siloed DBMS within HDFS and is a &#8220;rearguard defensive strategy&#8221; to protect the company&#8217;s sunk costs in its database technology.</p>
<div id="attachment_615210" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg"><img  alt="The Pivotal HD and Hawq architecture" src="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708&#038;h=387" width="708" height="387" class="size-large wp-image-615210" /></a><p class="wp-caption-text">The Pivotal HD and Hawq architecture</p></div>
<p>As for Hortonworks, Olson questions the wisdom of its Stinger initiative to boost Hive&#8217;s speed, noting that &#8220;Hive never got good while it was running standalone on MapReduce.&#8221; Hortonworks also <a href="http://gigaom.com/2013/04/15/teradata-to-connect-hadoop-and-data-warehouses-roll-out-new-appliance/">partners with vendors such as Teradata</a> to let their platforms access Hadoop data in its native format, but those approaches still require sending data over the network. &#8220;It&#8217;s not the way you would build it if you woke up in the 2000s and were building this anew,&#8221; Olson said.</p>
<div id="attachment_640854" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png"><img  alt="The Stinger roadmap" src="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708&#038;h=558" width="708" height="558" class="size-large wp-image-640854" /></a><p class="wp-caption-text">The Stinger roadmap</p></div>
<p>Olson acknowledged that the MapR-led Apache Drill project is cut from the same cloth as Impala (that is, being a Google Dremel clone designed specifically for Hadoop), but &#8220;the difference is we&#8217;re shipping code.&#8221; Being generally available and ready for production workloads means Cloudera can lock down users and market share before many even have a chance to experiment with Drill. He all but dismissed questions over the readiness of Impala, spurred by rumblings in the Hadoop space that Cloudera rushed it into public beta in order to get on the scoreboard against more fully baked offerings.</p>
<p>&#8220;I don&#8217;t feel we&#8217;re under the gun competitively to pull it out of beta because no one else has product in the market,&#8221; Olson said. &#8220;I have no problems &#8230; calling this GA quality.&#8221; He did, however, acknowledge that Impala is shipping with a &#8220;minium viable feature set&#8221; that the company has plans to build on in the near future. Impala Senior Product Manager Justin Erickson noted a few issues of concern, including around the number of concurrent users Impala can support, but said they have been addressed during the beta period.</p>
<h2 id="one-piece-of-a-larger-platform">One piece of a larger platform</h2>
<p>Really, though, the whole point of Impala and its competitors is to turn Hadoop from a tool for batch analytics and mass storage <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into a platform that can handle nearly all of companies&#8217; data-processing needs</a>. In that regard, it appears we&#8217;re just getting started. Cloudera, MapR, Pivotal Greenplum and Hortonworks are already pushing their own products and projects, and Olson said &#8220;it&#8217;s absolutely our intent&#8221; to enhance Cloudera&#8217;s platform with even more open-source products &#8212; perhaps even more database technologies <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">a la HBase</a> &#8212; that will let users do more stuff with more types of data. Over time, this strategy could result in Hadoop displacing the current breed of databases and data warehouses and becoming the single data store atop of which users run whatever applications they so desire. For now, though, especially when it comes to Impala and the data warehouse incumbents, Olson is taking a measured approach. &#8220;The likelihood that we&#8217;re going to knock them off in the near term,&#8221; he said, &#8220;&#8230; it would be a tough fight to win.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=577263"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=577263" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300" medium="image">
			<media:title type="html">How Impala fits in</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">Hadapt&#039;s architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708" medium="image">
			<media:title type="html">The Pivotal HD and Hawq architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708" medium="image">
			<media:title type="html">The Stinger roadmap</media:title>
		</media:content>
	</item>
		<item>
		<title>Welcome to Berkeley: Where Hadoop isn&#8217;t nearly fast enough</title>
		<link>http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/</link>
		<comments>http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 23:19:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[AMPLab]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[in-memory]]></category>
		<category><![CDATA[Mesos]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Shark]]></category>
		<category><![CDATA[Spark]]></category>
		<category><![CDATA[Tachyon]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=632061</guid>
		<description><![CDATA[Hadoop not fast enough for you? Then you might want to get to know AMPLab, a University of California, Berkeley team developing faster versions of many core Hadoop components.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632061&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Tucked within the computer science deparment at the University of California, Berkeley, there&#8217;s an institution called <a href="http://amplab.cs.berkeley.edu/">AMPLab</a> that&#8217;s making a name for itself by &#8212; among other things &#8212; essentially rebuilding the Hadoop platform, only faster.</p>
<div id="attachment_632077" class="wp-caption alignright" style="width: 283px"><a href="http://gigaom2.files.wordpress.com/2013/04/spark-lr.png"><img  alt="Results for linear regression test" src="http://gigaom2.files.wordpress.com/2013/04/spark-lr.png?w=708"   class="size-full wp-image-632077" /></a><p class="wp-caption-text">Results for linear regression test</p></div>
<p>AMPLab&#8217;s most well-known product in the big data space, called <a href="http://spark-project.org/">Spark</a>, is an in-memory parallel processing framework that&#8217;s comparable to Hadoop MapReduce except, its creators claim, it is up to 100 times faster. Because it runs in-memory, Spark might be comparable with something like <a href="http://gigaom.com/2012/10/24/metamarkets-open-sources-druid-its-in-memory-database/">Druid</a> or SAP&#8217;s HANA system, too. Spark is the processing engine that powers <a href="http://gigaom.com/2012/12/05/clearstory-data-raises-9m-and-might-actually-make-data-your-friend/">ClearStory&#8217;s next-generation analytics and visualization service</a>.</p>
<p>Like Hive as a data warehouse for Hadoop? Then you&#8217;ll love <a href="http://shark.cs.berkeley.edu/">Shark</a>, which is short for &#8220;Hive on Spark.&#8221;</p>
<p>Even as Hadoop gets more flexible thanks to new features such as YARN, which would technically allow it to run an alternative framework like Spark, AMPLab has its own cluster-management project called <a href="https://amplab.cs.berkeley.edu/projects/mesos-dynamic-resource-sharing-for-clusters/">Mesos</a>. Twitter <a href="http://gigaom.com/2012/04/19/twitter-backs-fave-big-data-projects-with-apache-sponsorship/">is a big fan of Mesos</a>, which is <a href="http://incubator.apache.org/mesos/">also an Apache Incubator project</a>.</p>
<p>AMPLab&#8217;s latest target is the Hadoop Distributed File System, or HDFS. HDFS has long been criticized for availability and speed, so AMPLab created an alternative called Tachyon (<a href="http://highscalability.com/blog/2013/4/17/tachyon-fault-tolerant-distributed-file-system-with-300-time.html">hat tip to High Scalability</a> for calling my attention to it). According to the <a href="http://tachyon-project.org/">Tachyon homepage</a>, &#8220;it offers up to 300 times higher throughput than HDFS, by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed.&#8221;</p>
<p>AMPLab isn&#8217;t the first to question the cult of HDFS, though. There are <a href="http://gigaom.com/cloud/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/">numerous commercial options available</a>, and Quantcast <a href="http://gigaom.com/2012/09/27/quantcast-releases-bigger-faster-stronger-hadoop-file-system/">built its own open source file system</a> that it claims is faster and more efficient when running at massive scale.</p>
<p>But it&#8217;s probably unfair to call AMPLab&#8217;s projects competitors to Hadoop. They&#8217;re certainly alternatives, but they&#8217;re also complementary, as Twitter&#8217;s heavy use of Hadoop and Mesos demonstrates. And Spark, Shark, Mesos and Tachyon are all compatible with their peer projects from the Apache Hadoop project.</p>
<p>Really, AMPLab is doing what any research institution does by pushing the limits of the current commercially available software. If it happens to disrupt the status quo, then so be it. For users, though, it&#8217;s just providing some new options to play around with as they try to find the right tool for their particular jobs. Its partners and sponsors, including Google, Facebook, Microsoft and Amazon Web Services, certainly have an interest in finding the best-possible technology, or creating it if necessary.</p>
<div id="attachment_632076" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/arch_mlbase-300x297.jpg"><img  alt="The MLBase architecture." src="http://gigaom2.files.wordpress.com/2013/04/arch_mlbase-300x297.jpg?w=708"   class="size-full wp-image-632076" /></a><p class="wp-caption-text">The MLBase architecture.</p></div>
<p>Other related AMPLab projects include <a href="https://amplab.cs.berkeley.edu/projects/piql-scale-independent-query-processing/">PIQL</a>, a SQL-like query language that sits atop a key-value store; <a href="https://amplab.cs.berkeley.edu/projects/mlbase/">MLBase</a>, a system for doing machine learning on distributed systems; <a href="https://amplab.cs.berkeley.edu/projects/akaros-%c2%a0an-operating-system-for-many-core-architectures-and-large-scale-smp-systems/">Akaros</a>, an operating system for manycore and large SMP systems; and <a href="https://amplab.cs.berkeley.edu/projects/sparrow-low-latency-scheduling-for-interactive-cluster-services/">Sparrow</a>, a cluster-scheduling system designed for low-latency computing.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632061&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=492117"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=492117" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=632061+welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/amplab.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/amplab.jpg?w=150" medium="image">
			<media:title type="html">amplab</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/spark-lr.png" medium="image">
			<media:title type="html">Results for linear regression test</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/arch_mlbase-300x297.jpg" medium="image">
			<media:title type="html">The MLBase architecture.</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloud and data first-quarter 2013: analysis and outlook</title>
		<link>http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/</link>
		<comments>http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/#comments</comments>
		<pubDate>Tue, 09 Apr 2013 06:55:36 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/davidlinthicum/" rel="author">David S. Linthicum</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon cloud computing]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[amazon-elastic-compute-cloud]]></category>
		<category><![CDATA[Amazon.com]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[apple inc.]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[Azure Services Platform]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[central-intelligence-agency]]></category>
		<category><![CDATA[Centralized computing]]></category>
		<category><![CDATA[CIA]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cisco Systems]]></category>
		<category><![CDATA[Client/Server]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[cloud computing services]]></category>
		<category><![CDATA[Cloud computing taxes]]></category>
		<category><![CDATA[Cloud Storage]]></category>
		<category><![CDATA[cloud storage services]]></category>
		<category><![CDATA[cloud technology]]></category>
		<category><![CDATA[cloud-applications]]></category>
		<category><![CDATA[cloud-based storage services]]></category>
		<category><![CDATA[cloud-infrastructure]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[CloudMe]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[consumer-oriented cloud storage services]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data processing store]]></category>
		<category><![CDATA[Data Synchronization]]></category>
		<category><![CDATA[database management systems]]></category>
		<category><![CDATA[database technology]]></category>
		<category><![CDATA[DataDirect Networks]]></category>
		<category><![CDATA[Datameer]]></category>
		<category><![CDATA[Dropbox]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[file hosting]]></category>
		<category><![CDATA[File system-sharing services]]></category>
		<category><![CDATA[firewall]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[icloud]]></category>
		<category><![CDATA[Idaho State Tax Commission]]></category>
		<category><![CDATA[Income taxes]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Joyent]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Macquarie Capital]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[massively parallel processing]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[microsoft-windows]]></category>
		<category><![CDATA[mobile device]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Nimbula]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[ObjectRocket]]></category>
		<category><![CDATA[Online backup services]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[oracle-corporation]]></category>
		<category><![CDATA[oracle-database]]></category>
		<category><![CDATA[parallel processing]]></category>
		<category><![CDATA[private clouds]]></category>
		<category><![CDATA[Public Clouds]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[Relational database]]></category>
		<category><![CDATA[relational database management systems]]></category>
		<category><![CDATA[saleseforce-com]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAN]]></category>
		<category><![CDATA[smartphone]]></category>
		<category><![CDATA[smartphones]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[software delivery]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Star Analytics]]></category>
		<category><![CDATA[storage-area-network]]></category>
		<category><![CDATA[Tablet computer]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[U.S. government]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?post_type=go-report&#038;p=173124/</guid>
		<description><![CDATA[Cloud computing is finally starting to add value to business, as those in charge of cloud within enterprises are moving from talking to doing. That much was very evident in the first quarter of 2013.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648537&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Cloud computing is finally starting to add value to business, as those in charge of cloud within enterprises are moving from talking to doing. That much was very evident in the first quarter of 2013.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648537&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=301574"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=301574" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Infrastructure Q2: Big data and PaaS gain more momentum</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" medium="image">
			<media:title type="html">gigaompromasterimagecloud</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8216;Linux of online learning&#8217; gets stronger: edX and Stanford team up to build open source platform</title>
		<link>http://gigaom.com/2013/04/02/linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform/</link>
		<comments>http://gigaom.com/2013/04/02/linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 04:00:34 +0000</pubDate>
		<dc:creator>Ki Mae Heussner</dc:creator>
				<category><![CDATA[education technology]]></category>
		<category><![CDATA[online education]]></category>
		<category><![CDATA[online learning]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626861</guid>
		<description><![CDATA[Despite its initial efforts at building its own open-source online learning platform, Stanford said it will fold that platform into the edX platform launched by Harvard and MIT.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626861&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In its mission to become the <a href="http://gigaom.com/2013/03/14/the-linux-of-online-learning-edx-takes-big-step-toward-open-source-goal/">&#8220;Linux of online learning,&#8221;</a> <a href="http://www.edx.org">edX</a> just got a powerful new partner. On Wednesday, the Harvard and MIT-backed non-profit is set to announce that it&#8217;s teaming up with Stanford to collaboratively develop the open-source edX platform.</p>
<p>Last fall, Stanford launched its own open-source online learning platform <a href="http://class.stanford.edu/">Class2Go</a>, which it released to the public in January. Developed by a team of Stanford engineers, the platform was designed to support the university’s online classes and research. In addition to being open, the platform was intended to be inter-operable with other services and portable (meaning that the course content isn’t tied to one platform). But as part of the new collaboration, Stanford will cease development on that platform and focus its efforts on edX.</p>
<p>&#8220;[We'll] fold in the key features of the Class2Go platform in the open-source edX and, together, we&#8217;ll be working on a single platform going ahead,&#8221; Anant Agarwal, president of edX, said on a call with reporters. &#8220;By putting all the wood behind one arrow, so to speak, we thought we could have a bigger impact.&#8221;</p>
<p>Since its launch, other schools around the world have started using Class2Go. While the platform will continue to be available to other users, John Mitchell, Stanford&#8217;s vice provost for online learning, said they&#8217;ll work with those schools to migrate to edX while it transitions its own courses.</p>
<p>The two organizations gave few details on how the collaboration would actually work. But they said that Class2Go’s analytics tools, which can track how long students watch a given video, which sections they repeat and other kinds of student activity on the site, are an example of the kinds of features that will be integrated with edX.</p>
<p>Despite Stanford’s collaboration on the edX platform, Mitchell said the university was not joining the “X University Consortium” of institutions that offer courses on the edX site &#8212; which is not entirely surprising given its affiliation with for-profit rival Coursera. The startup was launched by two Stanford professors and the university was one of its launch partners.</p>
<p>But even as Stanford and other top universities partner with for-profit online course providers, like Coursera and Udacity, the growing support for an open source platform shows that schools want to experiment with multiple approaches and be able to control and customize online educational courses and learning tools. The open-source approach means developers anywhere can add new tools to the platform, that professors can create online experiences that best suit their needs and that schools can learn from the innovation of others.</p>
<p>In addition to the Stanford partnership, edX also announced that on June 1, it will release the entire source code for the online learning platform. That development follows its announcement last month that it would <a href="http://gigaom.com/2013/03/14/the-linux-of-online-learning-edx-takes-big-step-toward-open-source-goal/">release its XBlock SDK</a>, the underlying architecture supporting edX course content.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626861&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=823621"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=823621" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/01/12-tech-leaders-resolutions-for-2012/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">12 tech leaders’ resolutions for 2012</a></li><li><a href="http://pro.gigaom.com/2011/08/what-the-google-motorola-deal-means-for-android-microsoft-and-the-mobile-industry/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">What the Google-Motorola deal means for Android, Microsoft and the mobile industry</a></li><li><a href="http://pro.gigaom.com/2011/07/open-sourcing-the-food-industry-new-technology-for-a-new-food-system/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=626861+linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform&utm_content=kimaeheussner">Open-sourcing the food industry: new technology for a new food system</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/02/linux-of-online-learning-gets-stronger-edx-and-stanford-team-up-to-build-open-source-platform/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/edx.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/edx.jpg?w=150" medium="image">
			<media:title type="html">edx</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/7467db695203dccb9119d2430d0c5246?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">kimaeheussner</media:title>
		</media:content>
	</item>
		<item>
		<title>Facebook builds a database benchmark for a graph-powered world</title>
		<link>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/</link>
		<comments>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/#comments</comments>
		<pubDate>Mon, 01 Apr 2013 22:28:39 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[LinkBench]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626218</guid>
		<description><![CDATA[Facebook has built a new open source tool for benchmarking graph databases, called LinkBench. And although the chances are your infrastructure and workloads look nothing like Facebook's, the good news is LinkBench was built with configurability in mind.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626218&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you&#8217;re doing any sort of social-media application, you might want to take note of what Facebook just built. The company has <a href="http://www.facebook.com/notes/facebook-engineering/linkbench-a-database-benchmark-for-the-social-graph/10151391496443920">created a benchmarking tool called LinkBench</a> that measures the performance of databases tasked with serving graph-structured data, which, presumably, is the lifeblood of every startup around that&#8217;s concerned with who&#8217;s connected to whom.</p>
<p>Although, of all LinkBench&#8217;s features &#8212; and you can read all about them in a Facebook Engineer wall post from Monday morning &#8212; probably the biggest is <a href="https://github.com/facebook/linkbench">that it&#8217;s open source</a> and built to be extensible. One of the biggest problems with benchmarks overall is that they rarely align with actual production workloads inside the companies that are supposed to care about them. In this case, for example, a benchmark for measuring the performance of <a href="http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/">Facebook&#8217;s massive MySQL</a>+memcached+<a href="http://www.facebook.com/note.php?note_id=388112370932">Flashcache</a> database architecture against its massive social graph and transaction activity would be all but worthless unless someone was just planning to rebuild Facebook.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg"><img  alt="linkbench copy" src="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg?w=708&#038;h=610" width="708" height="610" class="aligncenter size-large wp-image-626252" /></a></p>
<p>I&#8217;ve written in the past that <a href="http://gigaom.com/2012/07/26/why-crowdsourced-computing-benchmarks-are-the-future/">perhaps crowdsourced benchmarks are the wave of the futur</a>e: essentially a compiled set of statistics and best practices as more companies test different database (or Hadoop) technologies on different hardware setups against different workloads and publish the results. Everything will of course vary by the exact details within any given environment, but it would be a good way to get a sense of how a particular stack might, or perhaps should, fare.</p>
<p>But an open source benchmark tuned for a specific use case &#8212; social graphs &#8212; by probably the world&#8217;s foremost expert on that use case is interesting, too. Anyone else trying to serve data from their own social graphs can benefit from some of LinkBench&#8217;s more-prominent features, such as its ability to generate &#8220;large synthetic social graphs,&#8221; while tuning it to the specifics of their own infrastructure. After all, it might be that your app has different requirement around reading versus writing data, and <a href="http://gigaom.com/2011/07/21/is-stonebraker-right-why-sql-isnt-the-choice-du-jour-for-many-apps/">it&#8217;s very possible you&#8217;re not using MySQL</a>, either.</p>
<p>Or maybe you are using MySQL and want to see how a newer database technology might handle your graph workload. That, by the way, is one of the reasons Facebook built LinkBench, according to this post.</p>
<p>At any rate, the social web is all about graphs, and database performance really matters for anyone trying to build a service that stays online and provides a pleasant user experience. Say what you want about Facebook, but its services perform, so the bar is set high for anyone trying to dethrone it or at least to build something than can attract an equally large and devout following.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626218&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=439114"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=439114" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2010/10/is-the-future-of-enterprise-completely-open-source/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Is the Future of Enterprise Completely Open Source?</a></li><li><a href="http://pro.gigaom.com/2012/11/breaking-down-barriers-and-reducing-cycle-times-with-devops-and-continuous-delivery/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">How devops can reduce cycle times</a></li><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=626218+facebook-builds-a-database-benchmark-for-a-graph-powered-world&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/01/facebook-builds-a-database-benchmark-for-a-graph-powered-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/graph-copy-e1364854754198.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/graph-copy-e1364854754198.jpg?w=150" medium="image">
			<media:title type="html">graph copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/linkbench-copy.jpg?w=708" medium="image">
			<media:title type="html">linkbench copy</media:title>
		</media:content>
	</item>
	</channel>
</rss>
