<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; Cloudera</title>
	<atom:link href="http://gigaom.com/tag/cloudera/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sat, 25 May 2013 10:48:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; Cloudera</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>WibiData gets $15M to help it become the Hadoop application company</title>
		<link>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/</link>
		<comments>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/#comments</comments>
		<pubDate>Thu, 23 May 2013 11:31:17 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[OPower]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=648663</guid>
		<description><![CDATA[Startup WibiData has raised another $15 million and wants to turn the lessons it has learned in the field into generic software that can let anyone build predictive applications on Hadoop.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648663&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.wibidata.com/">WibiData</a> &#8212; the big data startup from Cloudera Co-founder Christophe Bisciglia and Aaron Kimball &#8212; doesn&#8217;t have <em>overly</em> big plans. It only wants to become one of the first, if not the first, company selling off-the-shelf software that lets other companies build valuable, customer-facing applications on Hadoop. On Thursday, WibiData announced $15 million in Series B funding from Canaan Partners, as well as existing investors NEA and Google Chairman Eric Schmidt, to help make the goal a reality. </p>
<p>Kidding aside, that&#8217;s actually quite an ambitious goal in a Hadoop market that&#8217;s big and growing, but that&#8217;s exemplified by expensive consulting arrangements and purpose-built applications. Even more so for companies that want to do something other than transforming unstructured data into structured data (often called ETL) or run back-office analytics jobs. In fact, WibiData has spent the last 18 months doing just this type of deal, and Bisciglia says every single customer has already engaged with one of the big three Hadoop vendors (Cloudera, Hortonworks and MapR). </p>
<p>Home energy-management startup <a href="http://gigaom.com/2012/11/19/opower-the-big-data-energy-player-to-beat/">Opower</a> is a good example of this process. It&#8217;s actually one of Cloudera&#8217;s banner customers, but &#8220;when they wanted to take [their software-as-a-service tool] beyond batch analysis and ETL workloads,&#8221; Bisciglia said, Opower came to WibiData. So whereas the Opower service was originally focused on nightly data analysis comparing users&#8217; energy usage against that of other users, it&#8217;s now working on dynamic recommendations for users and letting them engage with the application in new ways.</p>
<div id="attachment_648685" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg"><img  alt="The WibiData architecture" src="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg?w=300&#038;h=224" width="300" height="224" class="size-medium wp-image-648685" /></a><p class="wp-caption-text">The WibiData architecture</p></div>
<p>During these engagements, WibiData <a href="http://gigaom.com/2012/03/22/wibidata-structure-data-2012/">has been building up its core technology</a> for connecting those brawny back-office Hadoop environments to predictive customer-facing applications &#8211; a collection of HBase, data-formatting tools and machine learning algorithms that the company <a href="http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/">has been slowly open-sourcing under the Kiji banner</a>. It has also been learning the similarities among the applications it&#8217;s building for customers in the same field, figuring out what&#8217;s repeatable. What does any given company in the retail space, for example, need to get started on <a href="http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/">its own recommendation engine</a>? </p>
<p>And now, Bisciglia says, WibiData is going to double down on building application software based on what it has learned. The first two industries it targets will likely be financial services and retail, two areas where the company has seen a lot of traction. He envisions the finished product including some pre-defined schema for formatting data and some pre-built predictive models, both broadly applicable across that industry rather than specific to a single user. </p>
<p>There will also be different interfaces that allow different types of users (e.g., data scientists, systems engineers and business users) to interact with the data in the ways they need to. </p>
<p>Time will tell if WibiData can actually accomplish its goal of turning Hadoop into a collection of somewhat specialized software packages, but someone has to. Even industry heavyweights like Cloudera see the need, but their hands are full just getting Hadoop integrated into existing environments and getting those early uses up and running. As Cloudera CEO Mike Olson <a href="http://gigaom.com/2012/03/21/cloudera-structure-data-2012/">said at Structure: Data in 2012</a> to anyone ambitious enough to tackle the Hadoop-application gap, &#8220;Call me, I’ll connect you with funding. The money is out there.&#8221; </p>
<p>If you want to hear more about the need for Hadoop applications, check out this panel from Structure: Data 2013, where I speak with WibiData&#8217;s Omer Trajman, Continuuity&#8217;s Jonathan Gray and Pivotal&#8217;s Muddu Sudhakar. <span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/z7BhGEQX9BQ?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648663&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=273131"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=273131" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/12/why-the-big-data-startup-boom-will-likely-be-short-lived/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Why the big data startup boom will likely be short-lived</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/wibi-founders.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/wibi-founders.png?w=150" medium="image">
			<media:title type="html">wibi founders</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg?w=300" medium="image">
			<media:title type="html">The WibiData architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>With Impala now GA, Cloudera&#8217;s CEO sizes up the SQL-on-Hadoop market</title>
		<link>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/</link>
		<comments>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640777</guid>
		<description><![CDATA[Cloudera's Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop. And if CEO Mike Olson&#8217;s comments are any indication, we&#8217;re in for a long ride of competitive jockeying and oneupmanship as Cloudera and its peers go all Microsoft or Google and create myriad new data-processing engines to turn their Hadoop distributions into bona fide platforms.</p>
<p>Launched as a private beta in May 2012 and <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">made public in October</a>, Impala is Cloudera&#8217;s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It&#8217;s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.</p>
<div id="attachment_640848" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg"><img  alt="How Impala fits in" src="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300&#038;h=257" width="300" height="257" class="size-medium wp-image-640848" /></a><p class="wp-caption-text">How Impala fits in</p></div>
<p>Impala actually uses the same &#8220;nearly ANSI&#8221; version of SQL as does current standard bearer Hive, but that technology (created by Facebook in 2009 as a data warehouse layer for Hadoop) doesn&#8217;t run nearly fast enough to sate many users&#8217; desire for interactive analytics. This is because Hive transforms SQL queries into MapReduce jobs, meaning every one is processed against the entire corpus of data in the Hadoop Distributed File System.</p>
<h2 id="sizing-up-the-competition">Sizing up the competition</h2>
<p>Only Cloudera isn&#8217;t the first to have the idea, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">nor is it alone in trying to sell interactive SQL on Hadoop</a>. The idea was <a href="http://gigaom.com/2011/10/21/hadapt-raises-9-5m-for-hadoop-data-warehouse/">first commercialized by Boston-based startup Hadapt</a> in 2011, and is now being pushed by numerous startups and larger Hadoop players. Among them: Pivotal (formerly EMC) Greenplum, MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), Hortonworks (with <a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger</a>), Drawn to Scale, Splice Machine, Jethro Data and Citus Data.</p>
<div id="attachment_640858" class="wp-caption aligncenter" style="width: 600px"><a href="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg"><img  alt="Hadapt's architecture" src="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg?w=708"   class="size-full wp-image-640858" /></a><p class="wp-caption-text">Hadapt&#8217;s architecture</p></div>
<p>But Cloudera is arguably the biggest name pushing SQL on Hadoop, and CEO Mike Olson thinks Impala stands out for several reasons &#8212; not the least of which is that it exists as a product. &#8220;Nobody else is shipping production-grade SQL query support on Hadoop,&#8221; he told me during a recent call. &#8220;At least not in open source.&#8221; He seems content to let the startups do their things, instead focusing his attention on Cloudera&#8217;s big three Hadoop-distribution competitors in Pivotal, MapR and Hortonworks. Greenplum and Pivotal SVP Scott Yara <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">was full of confidence &#8212; and R&amp;D budget</a>&#8211; when the company announced the Pivotal HD distribution and HAWQ technology in February, but Olson claims the approach requires a siloed DBMS within HDFS and is a &#8220;rearguard defensive strategy&#8221; to protect the company&#8217;s sunk costs in its database technology.</p>
<div id="attachment_615210" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg"><img  alt="The Pivotal HD and Hawq architecture" src="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708&#038;h=387" width="708" height="387" class="size-large wp-image-615210" /></a><p class="wp-caption-text">The Pivotal HD and Hawq architecture</p></div>
<p>As for Hortonworks, Olson questions the wisdom of its Stinger initiative to boost Hive&#8217;s speed, noting that &#8220;Hive never got good while it was running standalone on MapReduce.&#8221; Hortonworks also <a href="http://gigaom.com/2013/04/15/teradata-to-connect-hadoop-and-data-warehouses-roll-out-new-appliance/">partners with vendors such as Teradata</a> to let their platforms access Hadoop data in its native format, but those approaches still require sending data over the network. &#8220;It&#8217;s not the way you would build it if you woke up in the 2000s and were building this anew,&#8221; Olson said.</p>
<div id="attachment_640854" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png"><img  alt="The Stinger roadmap" src="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708&#038;h=558" width="708" height="558" class="size-large wp-image-640854" /></a><p class="wp-caption-text">The Stinger roadmap</p></div>
<p>Olson acknowledged that the MapR-led Apache Drill project is cut from the same cloth as Impala (that is, being a Google Dremel clone designed specifically for Hadoop), but &#8220;the difference is we&#8217;re shipping code.&#8221; Being generally available and ready for production workloads means Cloudera can lock down users and market share before many even have a chance to experiment with Drill. He all but dismissed questions over the readiness of Impala, spurred by rumblings in the Hadoop space that Cloudera rushed it into public beta in order to get on the scoreboard against more fully baked offerings.</p>
<p>&#8220;I don&#8217;t feel we&#8217;re under the gun competitively to pull it out of beta because no one else has product in the market,&#8221; Olson said. &#8220;I have no problems &#8230; calling this GA quality.&#8221; He did, however, acknowledge that Impala is shipping with a &#8220;minium viable feature set&#8221; that the company has plans to build on in the near future. Impala Senior Product Manager Justin Erickson noted a few issues of concern, including around the number of concurrent users Impala can support, but said they have been addressed during the beta period.</p>
<h2 id="one-piece-of-a-larger-platform">One piece of a larger platform</h2>
<p>Really, though, the whole point of Impala and its competitors is to turn Hadoop from a tool for batch analytics and mass storage <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into a platform that can handle nearly all of companies&#8217; data-processing needs</a>. In that regard, it appears we&#8217;re just getting started. Cloudera, MapR, Pivotal Greenplum and Hortonworks are already pushing their own products and projects, and Olson said &#8220;it&#8217;s absolutely our intent&#8221; to enhance Cloudera&#8217;s platform with even more open-source products &#8212; perhaps even more database technologies <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">a la HBase</a> &#8212; that will let users do more stuff with more types of data. Over time, this strategy could result in Hadoop displacing the current breed of databases and data warehouses and becoming the single data store atop of which users run whatever applications they so desire. For now, though, especially when it comes to Impala and the data warehouse incumbents, Olson is taking a measured approach. &#8220;The likelihood that we&#8217;re going to knock them off in the near term,&#8221; he said, &#8220;&#8230; it would be a tough fight to win.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=633529"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=633529" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sector RoadMap: SQL-on-Hadoop platforms in 2013</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300" medium="image">
			<media:title type="html">How Impala fits in</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">Hadapt&#039;s architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708" medium="image">
			<media:title type="html">The Pivotal HD and Hawq architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708" medium="image">
			<media:title type="html">The Stinger roadmap</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloud and data first-quarter 2013: analysis and outlook</title>
		<link>http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/</link>
		<comments>http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/#comments</comments>
		<pubDate>Tue, 09 Apr 2013 06:55:36 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/davidlinthicum/" rel="author">David S. Linthicum</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon cloud computing]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[amazon-elastic-compute-cloud]]></category>
		<category><![CDATA[Amazon.com]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[apple inc.]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[Azure Services Platform]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[central-intelligence-agency]]></category>
		<category><![CDATA[Centralized computing]]></category>
		<category><![CDATA[CIA]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cisco Systems]]></category>
		<category><![CDATA[Client/Server]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[cloud computing services]]></category>
		<category><![CDATA[Cloud computing taxes]]></category>
		<category><![CDATA[Cloud Storage]]></category>
		<category><![CDATA[cloud storage services]]></category>
		<category><![CDATA[cloud technology]]></category>
		<category><![CDATA[cloud-applications]]></category>
		<category><![CDATA[cloud-based storage services]]></category>
		<category><![CDATA[cloud-infrastructure]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[CloudMe]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[consumer-oriented cloud storage services]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data processing store]]></category>
		<category><![CDATA[Data Synchronization]]></category>
		<category><![CDATA[database management systems]]></category>
		<category><![CDATA[database technology]]></category>
		<category><![CDATA[DataDirect Networks]]></category>
		<category><![CDATA[Datameer]]></category>
		<category><![CDATA[Dropbox]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[file hosting]]></category>
		<category><![CDATA[File system-sharing services]]></category>
		<category><![CDATA[firewall]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[icloud]]></category>
		<category><![CDATA[Idaho State Tax Commission]]></category>
		<category><![CDATA[Income taxes]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Joyent]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Macquarie Capital]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[massively parallel processing]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[microsoft-windows]]></category>
		<category><![CDATA[mobile device]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Nimbula]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[ObjectRocket]]></category>
		<category><![CDATA[Online backup services]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[oracle-corporation]]></category>
		<category><![CDATA[oracle-database]]></category>
		<category><![CDATA[parallel processing]]></category>
		<category><![CDATA[private clouds]]></category>
		<category><![CDATA[Public Clouds]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[Relational database]]></category>
		<category><![CDATA[relational database management systems]]></category>
		<category><![CDATA[saleseforce-com]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAN]]></category>
		<category><![CDATA[smartphone]]></category>
		<category><![CDATA[smartphones]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[software delivery]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Star Analytics]]></category>
		<category><![CDATA[storage-area-network]]></category>
		<category><![CDATA[Tablet computer]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[U.S. government]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?post_type=go-report&#038;p=173124/</guid>
		<description><![CDATA[Cloud computing is finally starting to add value to business, as those in charge of cloud within enterprises are moving from talking to doing. That much was very evident in the first quarter of 2013.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648537&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Cloud computing is finally starting to add value to business, as those in charge of cloud within enterprises are moving from talking to doing. That much was very evident in the first quarter of 2013.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648537&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=381561"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=381561" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648537+cloud-and-data-first-quarter-2013-analysis-and-outlook&utm_content=gigaedit">Infrastructure Q2: Big data and PaaS gain more momentum</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" medium="image">
			<media:title type="html">gigaompromasterimagecloud</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>Ignition raises $150M fund, opens Silicon Valley office, to back enterprise IT</title>
		<link>http://gigaom.com/2013/04/03/ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it/</link>
		<comments>http://gigaom.com/2013/04/03/ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 13:00:38 +0000</pubDate>
		<dc:creator>Barb Darrow</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[enterprise IT]]></category>
		<category><![CDATA[Frank Artale]]></category>
		<category><![CDATA[Ignition Partners]]></category>
		<category><![CDATA[John Connors]]></category>
		<category><![CDATA[Nick Sturiale]]></category>
		<category><![CDATA[Paul Maritz]]></category>
		<category><![CDATA[splunk]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=626900</guid>
		<description><![CDATA[Enterprise IT is a segment that has been underserved, says Ignition Partners' Frank Artale, so Ignition launched a new fund to attack that opportunity.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626900&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>More evidence that <a href="http://gigaom.com/2013/03/30/welcome-to-the-golden-age-of-enterprise-it-and-get-used-to-it-itll-be-here-for-a-while/">boring enterprise IT is not so boring anymore</a>: <a href="http://www.ignitionpartners.com/">Ignition Partners</a> has launched (and already closed) a new $150 million fund focused on technologies that businesses will buy and implement.</p>
<p><a href="http://gigaom.com/2013/01/18/wanted-an-amazon-enterprise-challenge/shutterstock_71910823/" rel="attachment wp-att-602411"><img alt="enterprise IT" src="http://gigaom2.files.wordpress.com/2013/01/shutterstock_71910823.jpg?w=300&#038;h=225" width="300" height="225" class="alignleft size-medium wp-image-602411"></a>The Bellevue, Wash.-based early-stage VC firm will also open an office in Palo Alto, Calif. to better attack these opportunities, said Frank Artale, general partner who will run this new fund, informally dubbed Ignition V. The company brought on Nick Sturiale, a new partner, to run that office.</p>
<p>“We think that businesses and people who work in businesses have been largely underserved for the past 15 years,”  Artale said in a recent interview.</p>
<p>The goal of the dual offices is to promote cross-pollination and collaboration. ”We want to do real social networking here — not just Facebook stuff,” Artale added. “Palo Alto and the Bay Area are super important as great entrepreneurial engines — Cisco, Oracle and other companies down there spit out great entrepreneurs.”</p>
<h2 id="goal-apps-that-combine-consume">Goal: Apps that combine consumer ease of use with enterprise utility</h2>
<p>Ignition has some credibility in the enterprise. Several team members – including Artale, John Connors, and Cameron Myhrvold — are former Microsoft executives. And previous investments include Cloudera, Splunk, Zenprise, DocuSign, Opscode, Parse and Bromium.</p>
<p>New enterprise applications have to work well and look good on laptops and PCs, but also on tablets and phones as the consumerization of IT trend continues, he said.</p>
<p>Artale which described the new fund as “slightly oversubscribed”  took three months to fund. Investors include new and existing university endowments, pension funds and investment companies. Ignition V is smaller than the previous fund, which weighed in at $400 million but will also focus more — eschewing investments in telecom and consumer internet companies, Artale said.</p>
<p>The notion that enterprise IT is back as a hot category is cropping up all over. New vendors — large and small — are building consumer-grade products but for business use. Pivotal Initiative chief <a href="http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/">Paul Maritz spoke in depth about this</a> at the recent Structure: Data conference in New York and the topic will doubtless crop up again at <a href="http://event.gigaom.com/structure/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=626900+ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it&amp;utm_content=gigabarb">Structure</a> in San Francisco in June.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=626900&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=569120"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=569120" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=626900+ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it&utm_content=gigabarb">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=626900+ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it&utm_content=gigabarb">Big data 2013: key trends and companies to watch</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=626900+ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it&utm_content=gigabarb">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/report/how-to-manage-big-data-without-breaking-the-bank/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=626900+ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it&utm_content=gigabarb">How to manage big data without breaking the bank</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/03/ignition-raises-150m-fund-opens-silicon-valley-office-to-back-enterprise-it/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/frank-artale-april-2013-gigaom1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/frank-artale-april-2013-gigaom1.jpg?w=150" medium="image">
			<media:title type="html">Frank Artale</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4af03439988d64f816da72496325cb73?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigabarb</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/shutterstock_71910823.jpg?w=300" medium="image">
			<media:title type="html">enterprise IT</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloudera scores analytics-as-a-service deal with Germany&#8217;s T-Systems</title>
		<link>http://gigaom.com/2013/03/21/cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems/</link>
		<comments>http://gigaom.com/2013/03/21/cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems/#comments</comments>
		<pubDate>Thu, 21 Mar 2013 10:12:13 +0000</pubDate>
		<dc:creator>David Meyer</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[t-systems]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=622707</guid>
		<description><![CDATA[The strategic partnership will see Cloudera's enterprise Hadoop distribution, along with its Impala real-time query engine, running on top of T-Systems' extensive cloud infrastructure in Europe and beyond.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622707&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Cloudera&#8217;s Hadoop implementation just got a big boost through a strategic partnership with Deutsche Telekom&#8217;s T-Systems, one of Europe&#8217;s biggest IT services companies. This would appear to be one of the first results of Cloudera&#8217;s <a href="http://gigaom.com/2012/12/06/cloudera-snares-big-65m-more-to-boost-international-enterprise-growth/">freshly-funded international enterprise push</a>.</p>
<p>The two companies are touting analytics-as-a-service using <a href="http://www.cloudera.com/content/cloudera/en/products/cloudera-enterprise-core/cloudera-enterprise-RTQ.html">Cloudera Enterprise RTQ</a>, featuring the <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">Impala SQL query engine</a>, on top of T-Systems&#8217; existing cloud computing infrastructure. The package is available immediately for T-Systems&#8217; European customers, while those outside Europe will get access in due course.</p>
<p>&#8220;Our customers don&#8217;t want to have to worry about the hardware and software for big data,&#8221; claimed T-Systems BI and big data chief Christian Wirth in a statement. &#8220;They don&#8217;t want technology, just a reliable service. We can offer precisely this &#8212; which is what makes our new offer with Cloudera so special.&#8221;</p>
<p>T-Systems counts big names such as Volkswagen and Royal Dutch Shell among its customers, so this is a significant deal for Cloudera. Cloudera&#8217;s is the go-to Hadoop distribution right now, but its position may not be unassailable: EMC&#8217;s GreenPlum division recently <a href="http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/">revamped its distribution</a> by fusing it with its own analytics database, and even <a href="http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/">Intel now has a distribution out there</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622707&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=708566"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=708566" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622707+cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems&utm_content=superglaze">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622707+cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems&utm_content=superglaze">Cloud and data first-quarter 2013: analysis and outlook</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622707+cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems&utm_content=superglaze">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622707+cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems&utm_content=superglaze">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/21/cloudera-scores-analytics-as-a-service-deal-with-germanys-t-systems/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6599daccfd7e897e68744fe0065e5a2e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">superglaze</media:title>
		</media:content>
	</item>
		<item>
		<title>Big data is still hard, but it gets better</title>
		<link>http://gigaom.com/2013/03/20/big-data-is-still-hard-but-it-gets-better/</link>
		<comments>http://gigaom.com/2013/03/20/big-data-is-still-hard-but-it-gets-better/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 19:35:03 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[dj patil]]></category>
		<category><![CDATA[Greylock]]></category>
		<category><![CDATA[Jeff Hammerbacher]]></category>
		<category><![CDATA[Quid]]></category>
		<category><![CDATA[Structure Data 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=622484</guid>
		<description><![CDATA[When it comes to using big data, there are still bottlenecks. Many of these are around the tools that people use to try to make sense of massive amounts of information.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622484&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>What’s standing between your staff and big data analysis? That was the existential question posed of DJ Patil and Jeff Hammerbacher at the <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=622484+big-data-is-still-hard-but-it-gets-better&amp;utm_content=shigginbotham">GigaOM Structure:Data</a> event today in New York. The two had different takes on how easy it was to give people the power to use data, with Hammerbacher, who is the co-founder of Cloudera, saying that it’s pretty simple today.</p>
<p>He did say that today many aspects of the input and ingress of data will end up being automated, much like systems administrators responsible for running the data center have seen many of their tasks automated.</p>
<p>Patil, who is now a data scientist in residence at Greylock Partners, was a bit more focused on end users. He shared his visit to a nonprofit called DoSomething.org earlier today, and said that people there had plenty of curiosity and a desire to play with data and ask questions, but they didn’t always know what to ask to get the insights they seemed to want. “We need another layer to help those people figure out what they want to ask,” he said.</p>
<p>From Patil’s perspective we need tools that will help us tell stories with data and let people play with it in ways that can help people come to new conclusions or see new relationships. “This is less of a machine learning problem than a ‘Can I try a bunch of things with the data?’ kind of problem,” said Patil.</p>
<p>And for those who are still intimidated by playing around with big data Patil has this to say, “Most people doing sophisticated analysis they don’t really know what they are doing.”</p>
<p>Check out the rest of our Structure:Data 2013 coverage here, and a video embed of the session follows below:</p>
<p><iframe src="http://new.livestream.com/accounts/74987/events/1927733/videos/14313139/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" width="640" height="360" frameborder="0" scrolling="no"></iframe><br>
A transcription of the video follows on the next page</p>
<p><a href="http://gigaom.com/2013/03/20/big-data-is-still-hard-but-it-gets-better/2/">Go to page 2 (of 2) on GigaOM .</a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622484&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=88190"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=88190" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622484+big-data-is-still-hard-but-it-gets-better&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622484+big-data-is-still-hard-but-it-gets-better&utm_content=shigginbotham">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/08/big-data-is-real-but-only-when-you-ask-the-right-question/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622484+big-data-is-still-hard-but-it-gets-better&utm_content=shigginbotham">Big data is real, but only when you ask the right questions</a></li><li><a href="http://pro.gigaom.com/2012/05/pervasive-software-retools-for-cloud-big-data-will-it-be-heard/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622484+big-data-is-still-hard-but-it-gets-better&utm_content=shigginbotham">Pervasive Software retools for cloud, big data: will it be heard?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/20/big-data-is-still-hard-but-it-gets-better/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/6qyvdm-kms8y7dikvq2vjo6702u-lksoid-tmvkdmau.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/6qyvdm-kms8y7dikvq2vjo6702u-lksoid-tmvkdmau.jpeg?w=150" medium="image">
			<media:title type="html">DJ Patil Greylock Ventures Jeff Hammerbacher Cloudera  Structure Data 2013</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>
	</item>
		<item>
		<title>Sector RoadMap: SQL-on-Hadoop platforms in 2013</title>
		<link>http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/</link>
		<comments>http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 12:00:16 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/josephturian/" rel="author">Joseph Turian</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[apache-hive]]></category>
		<category><![CDATA[aster]]></category>
		<category><![CDATA[Aster Big Analytics Appliance]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BigInsights]]></category>
		<category><![CDATA[Citus Data]]></category>
		<category><![CDATA[CitusDB]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Clustrix]]></category>
		<category><![CDATA[Concurrent]]></category>
		<category><![CDATA[Database theory]]></category>
		<category><![CDATA[Dremel]]></category>
		<category><![CDATA[Drill]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop Distributed File System]]></category>
		<category><![CDATA[HAWQ]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[HCatalog]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[JethroData]]></category>
		<category><![CDATA[karmasphere]]></category>
		<category><![CDATA[Lingual]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[MemSQL]]></category>
		<category><![CDATA[microstrategy]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[MPP]]></category>
		<category><![CDATA[NewSQL]]></category>
		<category><![CDATA[Optiq]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[parallel computing]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[PostreSQL]]></category>
		<category><![CDATA[RainStor]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[SAP HANA]]></category>
		<category><![CDATA[Splice Machine]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL 92]]></category>
		<category><![CDATA[SQL-H]]></category>
		<category><![CDATA[SQLStream]]></category>
		<category><![CDATA[Stinger]]></category>
		<category><![CDATA[Stringer]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[VoltDB]]></category>
		<category><![CDATA[zookeeper]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?post_type=go-report&#038;p=171512/</guid>
		<description><![CDATA[Today’s most successful companies are the ones with the ability to capture and analyze all data available to them. Enter SQL-on-Hadoop solutions, which increase the accessibility of Hadoop and allow organizations to reuse their investment learning in SQL. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648564&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Today’s most successful companies are the ones with the ability to capture and analyze all data available to them. Enter SQL-on-Hadoop solutions, which increase the accessibility of Hadoop and allow organizations to reuse their investment learning in SQL. </p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648564&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=980405"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=980405" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648564+sql-on-hadoop-roadmap-2013&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648564+sql-on-hadoop-roadmap-2013&utm_content=gigaedit">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648564+sql-on-hadoop-roadmap-2013&utm_content=gigaedit">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=648564+sql-on-hadoop-roadmap-2013&utm_content=gigaedit">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/04/elephant.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/04/elephant.jpg?w=150" medium="image">
			<media:title type="html">elephant</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>5 reasons why the future of Hadoop is real-time (relatively speaking)</title>
		<link>http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/</link>
		<comments>http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/#comments</comments>
		<pubDate>Thu, 07 Mar 2013 13:00:37 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[real-time processing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=616972</guid>
		<description><![CDATA[In Part III of our look at all things Hadoop, we examine the trends driving Hadoop's future. At the end of the day, everything is pushing Hadoop toward being just generally faster and easier to consume.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616972&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In some ways, Hadoop is a like a fine wine: It gets better with age as rough edges (or flavor profiles) are smoothed out, and those who wait to consume it will probably have a better experience. The only problem with this is that Hadoop exists in a world that’s more about <a href="http://www.urbandictionary.com/define.php?term=md+20%2F20">MD 20/20</a> than it is about <a href="http://www.winespectator.com/display/show?id=47374">Relentless Napa Valley 2008</a>: Companies often want to drink their big data fast, get drunk on insights, and then have some more — maybe something even stronger. And with data — unlike technology and tannins — it turns out older isn’t always better.</p>
<p>That’s a crude analogy, of course, but it gets at the essence of what’s currently plaguing Hadoop adoption and what will propel it forward in the next couple years. The work being done by companies like Cloudera and Hortonworks at the distribution level is great and important, as is MapReduce as a processing framework for certain types of batch workloads. But not every company can afford to be concerned about managing Hadoop on a day-to-day basis. And <a href="http://gigaom.com/2012/07/07/why-the-days-are-numbered-for-hadoop-as-we-know-it/">not every analytic job pairs well with MapReduce</a>.</p>
<p>In Part I of our four-part series on Hadoop, we <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">looked at how the technology was born</a> and grew into the juggernaut it is today. In Part II, <a href="http://gigaom.com/2013/03/05/the-hadoop-ecosystem-the-welcome-elephant-in-the-room-infographic/">we laid out the map of the current products and projects</a> that comprise the Hadoop ecosystem. In this installment, we’ll take a closer look at some of them and how they’re positioning themselves to be important players down the road. Finally, <a href="http://gigaom.com/2013/03/08/hadoop-through-the-years-a-gigaom-retrospective/">Part IV will highlight some the best Hadoop applications and seminal moments in Hadoop history</a>, as reported by GigaOM over the years.</p>
<p>If there’s one big Hadoop theme at our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=616972+5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking&amp;utm_content=dharrisstructure">Structure: Data conference</a> March 20-21 in New York, it’s the new realization that people shouldn’t be asking “What’s next after Hadoop?” but rather “What will Hadoop become next?”. Based on what’s transpiring today, the answer to that question is that Hadoop will become faster in all regards and more useful as a result.</p>
<h2 id="interactivity-big-data-style">Interactivity, big-data-style</h2>
<div id="attachment_612788" class="wp-caption alignright" style="width: 310px"><img alt="Source: Shutterstock user hauhu." src="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=300&#038;h=225" width="300" height="225" class="size-medium wp-image-612788"><p class="wp-caption-text">Source: Shutterstock user hauhu.</p></div>
<p>As I explained with some detail a couple weeks ago, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">SQL is what’s next for Hadoop</a>, and that’s not because of familiarity alone or the types of queries permitted by SQL <del datetime="2013-03-07T02:30:39+00:00"></del>on relational data<del datetime="2013-03-07T02:30:39+00:00"></del>. It’s also because the types of massively parallel processing engines developed to analyze relational data over the years are very fast. That means analysts can ask questions and get answers at speeds much closer to the speed of their intuitions than is possible when querying entire data sets using standard MapReduce.</p>
<p>But just as SQL and its processing techniques bring something to Hadoop, Hadoop (the Hadoop Distributed File System, specifically) brings something to the table, too. Namely, it brings scale and flexibility that don’t exist in the traditional data warehouse world, where new hardware and licenses can be expensive; so only the “valuable” data makes its way inside and only after it has been fitted to a pre-defined structure. Hadoop, on the other hand, provides virtually unlimited scale and schema-free storage, so companies can store however much information they want in whatever format they want and worry later about what they’ll actually use it for. (Actually, though, most Hadoop jobs do require some sort of structure in order to run, and Hadoop co-creator Mike Cafarella is <a href="http://cloudera.github.com/RecordBreaker/">working on a project called RecordBreaker</a> that aims to automate this process for certain data types.)</p>
<p>How hot is SQL-on-Hadoop space? I profiled the companies and projects working on it on Feb. 21, and since then EMC Greenplum <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">announced a completely rewritten Hadoop distribution</a> that fuses its analytic database to Hadoop, and an entirely new player called <a href="http://jethrodata.com/">JethroData</a> emerged along with $4.5 million in funding. Even if there’s a major shakeout, there will be a few lucky companies left standing to capitalize on a shift to Hadoop as <em>the</em> center of data gravity that EMC Greenplum’s Scott Yara (albeit a biased source) thinks will be the data equivalent of the mainframe’s demise.</p>
<h2 id="this-is-your-database-this-is-">This is your database. This is your database on HDFS</h2>
<p>The SQL versus NoSQL debate appears to be dying down as companies and developers begin to realize there’s definitely a place for both in most environments, but a new debate — with Hadoop at the center — might be about to start up. At its core is <a href="http://datagravity.org/">the concept of data gravity</a> and the large, attractive (in a gravitational sense) entity that is HDFS. Here’s the underlying question that might be posed: If I’m already storing my unstructured data in HDFS and am expected to replace my data warehouse with it, too, why would I also run a handful of other databases that require a separate data store?</p>
<p>This is in part why <a href="http://hbase.apache.org/">HBase</a> has attracted such a strong following despite its relative technical and commercial immaturity compared with comparable NoSQL database <a href="http://cassandra.apache.org/">Cassandra</a>. For applications that would benefit from a relational database, startups such as <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">Drawn to Scale</a> and <a href="http://gigaom.com/2012/10/17/batten-down-the-analysts-its-a-big-data-bi-storm/http://gigaom.com/2012/10/17/batten-down-the-analysts-its-a-big-data-bi-storm/">Splice Machine</a> have turned HBase into a transactional SQL system. Wibidata, the <a href="http://gigaom.com/2012/02/07/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/">new startup from Cloudera C0-founder Christophe Bisciglia and Aaron Kimball</a>, is <a href="http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/">pushing an open source framework called Kiji</a> to make it easier to develop applications that use HBase.</p>
<p>“If you talk to anyone from Cloudera or any of the platform vendors, I think they will tell you that a large percentage of their customers use HBase,” Bisciglia said. “It’s something that I only expect to see increasing.”</p>
<p>MapR seems to think so, too: the Hadoop-distribution vendor is getting ahead of the game by <a href="http://www.mapr.com/products/mapr-editions/m7-edition">selling an enterprise-grade version of HBase called M7</a>. Should hot startups such as <a href="http://gigaom.com/2012/04/13/meet-tempodb-a-database-startup-with-an-eye-for-time/">TempoDB</a> and <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a> decide to take their HBase-reliant cloud services into the data center, they’ll tap into Hadoop clusters, too.</p>
<p>And the National Security Agency built <a href="http://accumulo.apache.org/">Apache Accumulo</a>, a key-value database similar to HBase but designed for fine-grained security and massive scale. It’s now <a href="http://sqrrl.com/">being sold commercially by a startup called Sqrrl</a>. There’s even a graph-processing project called <a href="http://incubator.apache.org/giraph/">Giraph</a> that relies on HBase or Accumulo as the database layer.</p>
<h2 id="whatever-real-time-means-to-yo">Whatever “real-time” means to you</h2>
<p>Real-time is one of those terms that means different things to different people and different applications. The interactivity that SQL-on-Hadoop technologies promise is one definition, as is the type of stream processing <a href="http://gigaom.com/2011/08/04/twitter-to-open-source-hadoop-like-tool/">enabled by technologies like Storm</a>. When it comes to the latter, there’s a lot of excitement around YARN as the innovation will make it happen.</p>
<p><a href="http://hortonworks.com/blog/introducing-apache-hadoop-yarn/">YARN, aka MapReduce 2.0</a>, is a resource scheduler and distributed application framework that allows Hadoop users to run processing paradigms other than MapReduce. This could mean things, from traditional parallel-processing methods such as MPI to graph processing to newly developed stream-processing engines such as Storm and <a href="http://incubator.apache.org/s4/">S4</a>. Considering for how many years <em>Hadoop </em>meant <em>HDFS and MapReduce</em>, this type flexibility is certainly a big deal.</p>
<p><img alt="figure1" src="http://gigaom2.files.wordpress.com/2013/03/figure1.gif?w=300&#038;h=216" width="300" height="216" class="size-medium wp-image-617741 alignleft">Stream processing, of course, is the antithesis of batch processing, for which Hadoop is known, and which is inherently too slow for workloads such as serving real-time ads or monitoring sensor data. And even if Storm and other stream-processing platforms somehow don’t make their way onto Hadoop clusters, <a href="http://gigaom.com/2013/02/14/hstreaming-ready-to-show-the-world-its-real-time-hadoop/">a startup called HStreaming has made it its mission</a> to deliver stream processing to Hadoop, and <a href="http://www.continuuity.com/technology">it’s on other companies’ radars, as well</a>.</p>
<p>For what it’s worth, though, <a href="http://verticloud.com/">VertiCloud</a> Founder and CEO and former Yahoo CTO Raymie Stata thinks we should do away with terms such as <em></em>batch, real-time and interactive altogether. Instead, he prefers the terms synchronous and asynchronous to describe the human experience with the data rather than the speed of processing it. Synchronous computing happens at the speed of human activity, generally speaking, while asynchronous computing is largely decoupled from the idea of someone sitting in front of a computer screen awaiting a result.</p>
<p>The change in terms is associated with a change in how you manage SLAs for applications. Uploading photos to Flickr: synchronous. Running a MapReduce job: most likely asynchronous. Ironically, according to Stata, stream processing data with Storm is often asynchronous, too. That’s because there’s probably not someone on the other end waiting for a page to update or a query to return. And unless you’re doing something where guaranteed real-time latency is <em>necessary</em>, the occasional difference between milliseconds and 1 second probably isn’t critical.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972108%253Fsecret_token%253Ds-1QBTa"></iframe>
<h2 id="time-to-insight-starts-at-the-">Time to insight starts at the planning phase</h2>
<p>Even when MapReduce is the answer, though, not everyone is game for a long Hadoop deployment process coupled with a consulting deal to identify uses and build applications or workflows. Sometimes, you just want to buy some software and get going.</p>
<p>Already, companies such as Wibidata and Continuuity are trying to make it easier for companies to build Hadoop applications specific to their own needs, and Wibidata’s Bisciglia said his company is doing less and less customization the more it deals with customers in the same vertical markets. “I think it’s still a couple years out before you can buy a generic application that runs on Hadoop,” he told me, but he does see opportunity for billion-dollar businesses at this level, possibly selling the Hadoop equivalent of an ERP or CRM application.</p>
<div id="attachment_603561" class="wp-caption alignright" style="width: 310px"><img alt="Structure Data 2012: Michael Olson – CEO, Cloudera" src="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-603561"><p class="wp-caption-text">Cloudera CEO Mike Olson at Structure: Data 2012<br>(c) 2012 Pinar Ozger pinar@pinarozger.com</p></div>
<p>And Cloudera CEO Mike Olson <a href="http://gigaom.com/2012/03/21/cloudera-structure-data-2012/">told the audience at our Structure: Data conference last year</a> that he’ll connect startups trying to build Hadoop-based applications with funding opportunities. In fact, Cloudera backer Accel Partners <a href="http://gigaom.com/2011/11/08/accel-forms-100m-fund-to-feed-big-data-apps/">launched a Big Data Fund in 2011</a> with the sole purpose of funding application-level big data startups.</p>
<p>But maybe Cloudera, like database vendor Oracle before it, will just get into the application space itself: According to Hadoop creator and Cloudera chief architect Doug Cutting:</p>
<blockquote id="quote-i-wouldnt-be-surpris"><p>“I wouldn’t be surprised if you see vendors, like Cloudera, starting to creep up the stack and sell some applications. You’ve seen that before from Red Hat, from Oracle. You could argue that the relational database is a platform for Oracle and they’ve sold a lot of applications on top. So I think that happens as the market matures. When it’s young, we don’t want to stomp on potential collaborators at this point, we want to open that up to other people to really enhance the platform.”</p></blockquote>
<p>Cloud computing is proving to be a big help in getting Hadoop projects off the ground, too. Even low-level services such as Amazon Elastic MapReduce can <a href="http://gigaom.com/2012/02/22/how-infochimps-wants-to-become-heroku-for-hadoop/">ease the burden of managing a physical Hadoop cluster</a>, and there are already a handful of cloud services <a href="http://gigaom.com/2012/04/05/kontagent-turns-data-mining-into-saas-for-mobile-apps/">exposing Hadoop as a SaaS application</a> for business intelligence and analytics. The easier it gets to store, process and analyze data in the cloud, the more appealing Hadoop looks to potential users who can’t be bothered to invest in yet another IT project.</p>
<h2 id="google-and-microsoft-a-guiding">Google (and Microsoft): A guiding light</h2>
<p>Lest we forget, Hadoop is based on a set of Google technologies, and it seems likely its future will also be influenced by what Google is doing. Already, <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html">improvements to HDFS</a> seem to mirror <a href="http://www.theregister.co.uk/2009/08/12/google_file_system_part_deux/">changes to the Google File System a few years bac</a>k, and YARN will enable some new types of non-MapReduce processing similar to what <a href="http://research.google.com/pubs/pub36726.html">Google’s new Percolator framework</a> does. (Google claims Percolator lets it “process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.”) The MapR-led Apache Drill project <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">is a Hadoop-based version of Google’s Dremel tool</a>; Giraph was likely inspired by Google’s <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Pregel graph-processing technology</a>.</p>
<p>Cutting is particularly excited about Google Spanner, a database system that <a href="http://gigaom.com/2012/09/17/googles-spanner-a-database-that-knows-what-time-it-is/">spans data geographies while still maintaining transactional consistency</a>. “It’s a matter of time before somebody implements that in the Hadoop ecosystem,” he said. “That’s a huge change.”</p>
<p>It’s possible Microsoft could be an inspiration to the Hadoop community, too, especially if it begins to surface pieces of its Bing search infrastructure as products like a couple of company executives have told me it will. Bing <a href="http://research.microsoft.com/en-us/events/fs2011/helland_cosmos_big_data_and_big_challenges.pdf">runs on a combination of tools called Cosmos, Tiger and Scope</a>, and it’s part of the Online Services division ran by former Yahoo VP and Hadoop backer Qi Lu. Lu said that Microsoft (like Google) is looking beyond just search — Hadoop’s original function — and into building an information fabric that changes how data is indexed, searched for and presented.</p>
<p>However it evolves, though, it’s becoming pretty obvious that Hadoop is no longer just a technology for doing cheap storage and some MapReduce processing. “I think there’s still some doubt in people’s minds about whether Hadoop is a flash in the pan … and I think they’re missing the point,” Cutting said. “I think that’s going to be proven to people in the next year.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616972&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=451110"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=451110" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616972+5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616972+5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616972+5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li><li><a href="http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616972+5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking&utm_content=dharrisstructure">Sector RoadMap: SQL-on-Hadoop platforms in 2013</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/gigaom-hadoop-icon-final.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/gigaom-hadoop-icon-final.jpg?w=150" medium="image">
			<media:title type="html">gigaom hadoop icon final</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=300" medium="image">
			<media:title type="html">Source: Shutterstock user hauhu.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/figure1.gif?w=300" medium="image">
			<media:title type="html">figure1</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=300" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>
	</item>
		<item>
		<title>The history of Hadoop: From 4 nodes to the future of data</title>
		<link>http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/</link>
		<comments>http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/#comments</comments>
		<pubDate>Mon, 04 Mar 2013 13:00:43 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[VertiCloud]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=613362</guid>
		<description><![CDATA[In the first of our four-part multi-media series on Hadoop, the people who helped build Hadoop talk about its birth, its promise and the challenges in moving it from webscale to just large-scale.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613362&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Depending on how one defines its birth, <a href="http://hadoop.apache.org/">Hadoop</a> is now 10 years old. In that decade, Hadoop has gone from being the hopeful answer to Yahoo’s search-engine woes to a general-purpose computing platform that’s poised to be the foundation for the next generation of data-based applications.</p>
<p>Alone, Hadoop is a software market that IDC <a href="http://gigaom.com/2012/05/07/all-aboard-the-hadoop-money-train/">predicts will be worth $813 million</a> in 2016 (although that number is likely very low), but it’s also driving a big data market the research firm <a href="http://gigaom.com/2013/01/08/idc-says-big-data-will-be-24b-market-in-2016-i-say-its-bigger/">predicts will hit more than $23 billion</a> by 2016. Since Cloudera launched in 2008, Hadoop has spawned dozens of startups and <a href="http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/">spurred hundreds of millions in venture capital investment</a> since 2008.</p>
<p>In this four-part series, we’ll explain everything anyone concerned with information technology needs to know about Hadoop. Part I is the history of Hadoop from the people who willed it into existence and took it mainstream. Part II is more graphic; <a href="http://gigaom.com/2013/03/05/the-hadoop-ecosystem-the-welcome-elephant-in-the-room-infographic/">a map of the now-large and complex ecosystem</a> of companies selling Hadoop products. <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">Part III is a look into the future of Hadoop</a> that should serve as an opening salvo for much of the discussion <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&amp;utm_content=dharrisstructure">at our Structure: Data conference</a> March 20-21 in New York. Finally, <a href="http://gigaom.com/2013/03/08/hadoop-through-the-years-a-gigaom-retrospective/">part IV will highlight some the best Hadoop applications and seminal moments in Hadoop history</a>, as reported by GigaOM over the years.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972101%253Fsecret_token%253Ds-RbbVK"></iframe>
<h2 id="wanted-a-better-search-engine">Wanted: A better search engine</h2>
<p>Almost everywhere you go online now, Hadoop is there in some capacity. <a href="http://gigaom.com/2012/06/13/how-facebook-keeps-100-petabytes-of-hadoop-data-online/">Facebook</a>, <a href="http://gigaom.com/2012/01/31/under-the-covers-of-ebays-big-data-operation/">eBay</a>, <a href="http://gigaom.com/2011/11/02/how-etsy-handcrafted-a-big-data-strategy/">Etsy</a>, <a href="http://gigaom.com/2012/12/02/pinterest-flipboard-and-yelp-tell-how-to-save-big-bucks-in-the-cloud/">Yelp</a> , <a href="http://gigaom.com/2012/03/07/how-twitter-is-doing-its-part-to-democratize-big-data/">Twitter</a>, <a href="http://gigaom.com/2012/09/17/5-ideas-to-help-everyone-make-the-most-of-big-data/">Salesforce.com</a> — you name a popular web site or service, and the chances are it’s using Hadoop to analyze the mountains of data it’s generating about user behavior and even its own operations. Even in the physical world, forward-thinking companies in fields ranging from <a href="http://gigaom.com/2012/09/16/how-disney-built-a-big-data-platform-on-a-startup-budget/">entertainment</a> to <a href="http://gigaom.com/2012/10/11/the-rent-is-too-damn-high-but-big-data-means-the-power-bill-isnt/">energy management</a> to <a href="http://gigaom.com/2012/04/17/satellite-imagery-and-hadoop-mean-70m-for-skybox/">satellite imagery</a> are using Hadoop to analyze the unique types of data they’re collecting and generating.</p>
<p>Everyone involved with information technology at least knows what it is. Hadoop even serves as the foundation for new-school <a href="http://incubator.apache.org/giraph/">graph</a> and <a href="http://hbase.apache.org/">NoSQL databases</a>, as well as <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">bigger, badder versions of relational databases</a> that have been around for decades.</p>
<p>But it wasn’t always this way, and today’s uses are a long way off from the original vision of what Hadoop could be.</p>
<div id="attachment_616209" class="wp-caption alignleft" style="width: 210px"><img alt="Doug Cutting" src="http://gigaom2.files.wordpress.com/2013/03/cutting.jpg?w=708"   class="size-full wp-image-616209"><p class="wp-caption-text">Doug Cutting</p></div>
<p>When the seeds of Hadoop were first planted in 2002, the world just wanted a better open-source search engine. So then-Internet Archive search director Doug Cutting and University of Washington graduate student Mike Cafarella set out to build it. They called their project <a href="http://nutch.apache.org/">Nutch</a> and it was designed with that era’s web in mind.</p>
<p>Looking back on it today, early iterations of Nutch were kind of laughable. About a year into their work on it, Cutting and Cafarella thought things were going pretty well because Nutch was already able to crawl and index hundreds of millions of pages. “At the time, when we started, we were sort of thinking that a web search engine was around a billion pages,” Cutting explained to me, “so we were getting up there.”</p>
<p>There are now about 700 million web sites and, <a href="http://articles.cnn.com/2011-09-12/tech/web.index_1_internet-neurons-human-brain?_s=PM%3ATECH">according to Wired’s Kevin Kelly</a>, well over a trillion web pages.</p>
<p>But getting Nutch to work wasn’t easy. It could only run across a handful of machines, and someone had to watch it around the clock to make sure it didn’t fall down.</p>
<div id="attachment_616210" class="wp-caption alignright" style="width: 251px"><img alt="Mike Cafarella" src="http://gigaom2.files.wordpress.com/2013/03/cafarella241.jpg?w=708"   class="size-full wp-image-616210"><p class="wp-caption-text">Mike Cafarella</p></div>
<p>“I remember working on it for several months, being quite proud of what we had been doing, and then the Google File System paper came out and I realized ‘Oh, that’s a much better way of doing it. We should do it that way,’” reminisced Cafarella. “Then, by the time we had a first working version, the MapReduce paper came out and that seemed like a pretty good idea, too.”</p>
<p>Google released the <a href="http://research.google.com/archive/gfs.html">Google File System paper</a> in October 2003 and the <a href="http://research.google.com/archive/mapreduce.html">MapReduce paper</a> in December 2004. The latter would prove especially revelatory to the two engineers building Nutch.</p>
<p>“What they spent a lot of time doing was generalizing this into a framework that automated all these steps that we were doing manually,” Cutting explained.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972106%253Fsecret_token%253Ds-gmRg8"></iframe>
<p>Raymie Stata, founder and CEO of Hadoop startup <a href="http://verticloud.com/">VertiCloud</a> (and former Yahoo CTO), calls MapReduce “a fantastic kind of abstraction” over the distributed computing methods and algorithms most search companies were already using:</p>
<blockquote id="quote-everyone-had-somethi"><p>“Everyone had something that pretty much was like MapReduce because we were all solving the same problems. We were trying to handle literally billions of web pages on machines that are probably, if you go back and check, epsilon more powerful than today’s cell phones. … So there was no option but to latch hundreds to thousands of machines together to build the index. So it was out of desperation that MapReduce was invented.”</p></blockquote>
<div id="attachment_616201" class="wp-caption aligncenter" style="width: 718px"><img alt="MapReduce diagram, from the Google paper" src="http://gigaom2.files.wordpress.com/2013/03/index-auto-0008-0001.gif?w=708&#038;h=489" width="708" height="489" class="size-large wp-image-616201"><p class="wp-caption-text">Parallel processing in MapReduce, from the Google paper</p></div>
<p>Over the course of a few months, Cutting and Cafarella built up the underlying file systems and processing framework that would become Hadoop (in Java, notably, whereas Google’s MapReduce used C++) and ported Nutch on top of it. Now, instead of having one guy watch a handful of machines all day long, Cutting explained, they could just set it running on between 20 and 40 machines that he and Cafarella were able to scrape together from their employers.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972114%253Fsecret_token%253Ds-yCIvx"></iframe>
<h2 id="bringing-hadoop-to-life-but-no">Bringing Hadoop to life (but not in search)</h2>
<p>Anyone vaguely familiar with the history of Hadoop can guess what happens next: In 2006, Cutting went to work with Yahoo, which was equally impressed by the Google File System and MapReduce papers and wanted to build open source technologies based on them. They spun out the storage and processing parts of Nutch to form Hadoop (named after Cutting’s son’s stuffed elephant) as an open-source Apache Software Foundation project and the Nutch web crawler remained its own separate project.</p>
<p>“This seem like a perfect fit because I was looking for more people to work on it, and people who had thousands of computers to run it on,” Cutting said.</p>
<p>Cafarella, now <a href="http://web.eecs.umich.edu/~michjc/bio.html">an associate professor at the University of Michigan</a>, opted to forgo a career in corporate IT and focus on his education. He’s happy as a professor — and currently working on a Hadoop-complementary project called <a href="http://cloudera.github.com/RecordBreaker/">RecordBreaker</a> — but, he joked, “My dad calls me the Pete Best of the big data world.”</p>
<p>Ironically, though, the 2006-era Hadoop was nowhere near ready to handle production search workloads at webscale — the very task it was created to do. “The thing you gotta remember,” explained Hortonworks Co-founder and CEO Eric Baldeschwieler (who was previously VP of Hadoop software development at Yahoo), “is at the time we started adopting it, the aspiration was definitely to rebuild Yahoo’s web search infrastructure, but Hadoop only really worked on 5 to 20 nodes at that point, and it wasn’t very performant, either.”</p>
<div id="attachment_616234" class="wp-caption aligncenter" style="width: 718px"><a href="http://www.flickr.com/photos/yodelanecdotal/4746014041/sizes/l/in/photostream/"><img alt="Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal" src="http://gigaom2.files.wordpress.com/2013/03/4746014041_7a80b97c2e_b.jpg?w=708&#038;h=472" width="708" height="472" class="size-large wp-image-616234"></a><p class="wp-caption-text">Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal</p></div>
<p>Stata recalls a “slow march” of horizontal scalability, growing Hadoop’s capabilities from the single digits of nodes into the tens of nodes and ultimately into the thousands. “It was just an ongoing slog … every factor of 2 or 1.5 even was serious engineering work,” he said. But Yahoo was determined to scale Hadoop as far as it needed to go, and it continued investing heavy resources into the project.</p>
<p>It actually took years for Yahoo to moves its web index onto Hadoop, but in the meantime the company made what would be a fortuitous decision to set up what it called a “research grid” for the company’s data scientists, to use today’s parlance. It started with dozens of nodes and ultimately grew to hundreds as they added more and more data and Hadoop’s technology matured. What began life as a proof of concept fast became a whole lot more.</p>
<p>“This very quickly kind of exploded and became our core mission,” Baldeschwieler said, “because what happened is the data scientists not only got interesting research results — what we had anticipated — but they also prototyped new applications and demonstrated that those applications could substantially improve Yahoo’s search relevance or Yahoo’s advertising revenue.”</p>
<p>Shortly thereafter, Yahoo began rolling out Hadoop to power analytics for various production applications. Eventually, Stata explained, Hadoop had proven so effective that Yahoo merged its search and advertising into one unit so that Yahoo’s bread-and-butter sponsored search business could benefit from the new technology.</p>
<div id="attachment_616207" class="wp-caption aligncenter" style="width: 718px"><a href="http://www.flickr.com/photos/joeywan/2467450286/"><img alt="Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM's Hadoop Meetup in 2008." src="http://gigaom2.files.wordpress.com/2013/03/2467450286_db547ef9ef_b.jpg?w=708&#038;h=365" width="708" height="365" class="size-large wp-image-616207"></a><p class="wp-caption-text">Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM’s Hadoop Meetup in 2008.</p></div>
<p>And <a href="http://gigaom.com/2010/06/29/yahoo-secures-and-tames-hadoop-with-new-tools/">that’s exactly what happened</a>, because although data scientists didn’t need things like service-level agreements, business leaders did. So, Stata said, Yahoo implemented some scheduling changes within Hadoop. And although data scientists didn’t need security, Securities and Exchange Commission requirements mandated a certain level of security when Yahoo moved its sponsored search data onto it.</p>
<p>“That drove a certain level of maturity,” Stata said. “… We ran all the money in Yahoo through it, eventually.”</p>
<p>The transformation into Hadoop being “behind every click” (or every batch process, technically) at Yahoo was pretty much complete by 2008, Baldeschwieler said. That meant doing everything from these line-of-business applications to spam filtering to personalized display decisions on the Yahoo front page. By the time Yahoo spun out Hortonworks into a separate, Hadoop-focused software company in 2011, Yahoo’s Hadoop infrastructure consisted of 42,000 nodes and hundreds of petabytes of storage.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972099%253Fsecret_token%253Ds-g7Wo5"></iframe>
<h2 id="from-the-classroom">From the classroom …</h2>
<p>However, although Yahoo was responsible for the vast majority of development during its formative years, Hadoop didn’t exist in a bubble inside Yahoo’s headquarters. It was a full-on Apache project that attracted users and contributors from around the world. Guys like Tom White, a Welshman who actually wrote O’Reilly Media’s book <i>Hadoop: The Definitive Guide</i> despite being what Cutting describes as a guy who just liked software and played with Hadoop at night.</p>
<p>Up in Seattle in 2006, a young Google engineer named Christophe Bisciglia was using his 20 percent time to teach a computer science course at the University of Washington. Google wanted to hire new employees with experience working on webscale data, but its MapReduce code was proprietary, so it bought a rack of servers and used Hadoop as a proxy.</p>
<p><a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/2/">Go to page 2 (of 2) on GigaOM .</a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613362&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=594562"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=594562" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/11/real-%c2%adtime-query-for-hadoop-democratizes-access-to-big-data-analytics/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">Real-­time query for Hadoop democratizes access to big data analytics</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/gigaom-hadoop-icon-final.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/gigaom-hadoop-icon-final.jpg?w=150" medium="image">
			<media:title type="html">gigaom hadoop icon final</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/cutting.jpg" medium="image">
			<media:title type="html">Doug Cutting</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/cafarella241.jpg" medium="image">
			<media:title type="html">Mike Cafarella</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/index-auto-0008-0001.gif?w=708" medium="image">
			<media:title type="html">MapReduce diagram, from the Google paper</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/4746014041_7a80b97c2e_b.jpg?w=708" medium="image">
			<media:title type="html">Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/2467450286_db547ef9ef_b.jpg?w=708" medium="image">
			<media:title type="html">Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM&#039;s Hadoop Meetup in 2008.</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloudera who? Intel announces its own Hadoop distribution</title>
		<link>http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/</link>
		<comments>http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/#comments</comments>
		<pubDate>Tue, 26 Feb 2013 18:26:31 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Mapr]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=614504</guid>
		<description><![CDATA[Intel's getting into the open source software business with it's own version of Hadoop. It joins a host of startups as well as EMC Greenplum in building a distribution for big data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=614504&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Intel on Tuesday said it was getting into the software business with its own Hadoop distribution. The move is a potential blow for startups such as Cloudera, Hortonworks and MapR that are offering their own distributions of Hadoop, but it’s also an admission by the chip vendor that the opportunity in big data isn’t only to be found in selling hardware.</p>
<p>In a conference held in San Francisco, VP and General Manager of Intel’s Datacenter Software Division Boyd Davis explained Intel’s history in Hadoop that stretches back to 2009 and stressed that Intel is going to share some aspects of its Hadoop distribution, but not all. Intel has a distribution of Hadoop it has released in China, but today it’s bringing it to the United States Intel’s version of the Hadoop distribution uses Hadoop 2.0 and YARN, which is a cutting-edge version of  platform compared with what most Hadoop users have deployed thus far.</p>
<h2 id="why-intel-wants-to-push-its-ow">Why Intel wants to push its own version of Hadoop</h2>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/intelhadoophistory.jpg"><img alt="intelhadoophistory" src="http://gigaom2.files.wordpress.com/2013/02/intelhadoophistory.jpg?w=708&#038;h=400" width="708" height="400" class="aligncenter size-full wp-image-614518"></a></p>
<p>Boyd introduced partners such as and Cisco, which has tuned the Intel Hadoop distribution for its own servers. Intel also hosted a panel that included executives from SAP, Red Hat and Savvis to discuss the challenges of big data and the promise of Hadoop and big data.</p>
<p>Davis was up front about Intel’s rationale for releasing its own distribution, namely that it was worried about the fragmentation and possible uncertainty associated with current Hadoop distributions. That could be read as a dig against the many startups already offering Hadoop distributions, all of which are slightly different (of course, Intel’s will be slightly different, too). Like all of the existing players such as Cloudera and MapR, Intel will open source certain aspects of its distribution, but will also keep software to itself.</p>
<h2 id="inside-the-data-center-its-no-">Inside the data center, it’s no longer just web servers that matter</h2>
<p>For example, Davis stressed that Intel will not share its management and monitoring software, which could be highly valuable for enterprise customers. The Intel software could coordinate with Intel’s data center management software and make managing a variety of workloads easier. And hidden in that coordination might be one Intel’s aims in pushing its own version of Hadoop — the threat of ARM chips used in Hadoop clusters.</p>
<p>Dell, Calxeda and others are evaluating the use of lower-performance, <a href="http://gigaom.com/2012/10/24/dell-wants-to-tune-big-data-apps-for-arm-servers/">lower-power chips in Hadoop clusters</a>, a market <a href="http://gigaom.com/2011/06/13/big-data-on-micro-servers-you-bet/">Intel would hate to cede in the data center</a> as data grows and analytics becomes more important. To that end, Intel has also optimized its Hadoop distribution for solid-state drives, something that other Hadoop companies haven’t done so far.</p>
<p>When asked about Atom and the use of lower-performance processors for Hadoop, Davis noted that while people are using lower-end processors for Hadoop , but that those uses tend to have slower networking. Davis says that when you combine high-end processors with 10 gigabit Ethernet and Hadoop, customers get the performance that they want. </p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/intelhadoop.jpg"><img alt="intelhadoop" src="http://gigaom2.files.wordpress.com/2013/02/intelhadoop.jpg?w=708&#038;h=397" width="708" height="397" class="aligncenter size-full wp-image-614552"></a></p>
<p>So while Intel may tout stability and consistency as the reason for it’s decision to become a major player in the software market for big data, it’s also driven by the changes in the data center that threaten the grip Intel has on the hardware inside the data center. The cloud and big data has changed the workloads and hardware requirements for the data center and Intel is playing the long game in trying to release software that can be tuned to its chips.</p>
<h2 id="the-hadoop-drama-isnt-over-yet">The Hadoop drama isn’t over yet</h2>
<p>Intel isn’t the only big vendor touting its own homegrown version of Hadoop. On Monday, <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">EMC’s Greenplum division announced an entirely revamped version</a> of its Hadoop distribution that’s merged with it’s flagship analytic SQL database. These big companies have big existing businesses to protect and lots of resources to put into doing it. As my colleague Derrick Harris wrote on the EMC news:</p>
<blockquote id="quote-looking-past-his-com"><p>Looking past his competitive boasting, though, it’s easy to see [Greenplum's Scott] Yara’s greater point when you ask him what all this Hadoop talks means for the data warehouse business on which Greenplum was built. He points to the mainframe business that fell from its high perch decades ago but still drives billions a year in revenue. A single MPP database system is still faster on certain workloads than SQL on Hadoop, but that gap will close over time and “I do think the center of gravity will move toward HDFS,” he said.</p></blockquote>
<p>Hadoop is a juggernaut when it comes to big data. Intel is a juggernaut when it comes to data center infrastructure. Its decision to enter into the open source software market is a big one for the chip company, for the Hadoop ecosystem and for the myriad startups playing in this space. It’s a topic we’ll explore more during our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&amp;utm_content=shigginbotham">Structure Data conference in New York on March 20 and 21</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=614504&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=857841"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=857841" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Why service providers matter for the future of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/hadoop1-210x140.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hadoop1-210x140.jpg?w=150" medium="image">
			<media:title type="html">hadoop1-210x140</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/intelhadoophistory.jpg" medium="image">
			<media:title type="html">intelhadoophistory</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/intelhadoop.jpg" medium="image">
			<media:title type="html">intelhadoop</media:title>
		</media:content>
	</item>
	</channel>
</rss>
