<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; Hortonworks</title>
	<atom:link href="http://gigaom.com/tag/hortonworks/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Mon, 20 May 2013 13:19:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; Hortonworks</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>With Impala now GA, Cloudera&#8217;s CEO sizes up the SQL-on-Hadoop market</title>
		<link>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/</link>
		<comments>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640777</guid>
		<description><![CDATA[Cloudera's Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop. And if CEO Mike Olson&#8217;s comments are any indication, we&#8217;re in for a long ride of competitive jockeying and oneupmanship as Cloudera and its peers go all Microsoft or Google and create myriad new data-processing engines to turn their Hadoop distributions into bona fide platforms.</p>
<p>Launched as a private beta in May 2012 and <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">made public in October</a>, Impala is Cloudera&#8217;s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It&#8217;s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.</p>
<div id="attachment_640848" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg"><img  alt="How Impala fits in" src="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300&#038;h=257" width="300" height="257" class="size-medium wp-image-640848" /></a><p class="wp-caption-text">How Impala fits in</p></div>
<p>Impala actually uses the same &#8220;nearly ANSI&#8221; version of SQL as does current standard bearer Hive, but that technology (created by Facebook in 2009 as a data warehouse layer for Hadoop) doesn&#8217;t run nearly fast enough to sate many users&#8217; desire for interactive analytics. This is because Hive transforms SQL queries into MapReduce jobs, meaning every one is processed against the entire corpus of data in the Hadoop Distributed File System.</p>
<h2 id="sizing-up-the-competition">Sizing up the competition</h2>
<p>Only Cloudera isn&#8217;t the first to have the idea, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">nor is it alone in trying to sell interactive SQL on Hadoop</a>. The idea was <a href="http://gigaom.com/2011/10/21/hadapt-raises-9-5m-for-hadoop-data-warehouse/">first commercialized by Boston-based startup Hadapt</a> in 2011, and is now being pushed by numerous startups and larger Hadoop players. Among them: Pivotal (formerly EMC) Greenplum, MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), Hortonworks (with <a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger</a>), Drawn to Scale, Splice Machine, Jethro Data and Citus Data.</p>
<div id="attachment_640858" class="wp-caption aligncenter" style="width: 600px"><a href="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg"><img  alt="Hadapt's architecture" src="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg?w=708"   class="size-full wp-image-640858" /></a><p class="wp-caption-text">Hadapt&#8217;s architecture</p></div>
<p>But Cloudera is arguably the biggest name pushing SQL on Hadoop, and CEO Mike Olson thinks Impala stands out for several reasons &#8212; not the least of which is that it exists as a product. &#8220;Nobody else is shipping production-grade SQL query support on Hadoop,&#8221; he told me during a recent call. &#8220;At least not in open source.&#8221; He seems content to let the startups do their things, instead focusing his attention on Cloudera&#8217;s big three Hadoop-distribution competitors in Pivotal, MapR and Hortonworks. Greenplum and Pivotal SVP Scott Yara <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">was full of confidence &#8212; and R&amp;D budget</a>&#8211; when the company announced the Pivotal HD distribution and HAWQ technology in February, but Olson claims the approach requires a siloed DBMS within HDFS and is a &#8220;rearguard defensive strategy&#8221; to protect the company&#8217;s sunk costs in its database technology.</p>
<div id="attachment_615210" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg"><img  alt="The Pivotal HD and Hawq architecture" src="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708&#038;h=387" width="708" height="387" class="size-large wp-image-615210" /></a><p class="wp-caption-text">The Pivotal HD and Hawq architecture</p></div>
<p>As for Hortonworks, Olson questions the wisdom of its Stinger initiative to boost Hive&#8217;s speed, noting that &#8220;Hive never got good while it was running standalone on MapReduce.&#8221; Hortonworks also <a href="http://gigaom.com/2013/04/15/teradata-to-connect-hadoop-and-data-warehouses-roll-out-new-appliance/">partners with vendors such as Teradata</a> to let their platforms access Hadoop data in its native format, but those approaches still require sending data over the network. &#8220;It&#8217;s not the way you would build it if you woke up in the 2000s and were building this anew,&#8221; Olson said.</p>
<div id="attachment_640854" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png"><img  alt="The Stinger roadmap" src="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708&#038;h=558" width="708" height="558" class="size-large wp-image-640854" /></a><p class="wp-caption-text">The Stinger roadmap</p></div>
<p>Olson acknowledged that the MapR-led Apache Drill project is cut from the same cloth as Impala (that is, being a Google Dremel clone designed specifically for Hadoop), but &#8220;the difference is we&#8217;re shipping code.&#8221; Being generally available and ready for production workloads means Cloudera can lock down users and market share before many even have a chance to experiment with Drill. He all but dismissed questions over the readiness of Impala, spurred by rumblings in the Hadoop space that Cloudera rushed it into public beta in order to get on the scoreboard against more fully baked offerings.</p>
<p>&#8220;I don&#8217;t feel we&#8217;re under the gun competitively to pull it out of beta because no one else has product in the market,&#8221; Olson said. &#8220;I have no problems &#8230; calling this GA quality.&#8221; He did, however, acknowledge that Impala is shipping with a &#8220;minium viable feature set&#8221; that the company has plans to build on in the near future. Impala Senior Product Manager Justin Erickson noted a few issues of concern, including around the number of concurrent users Impala can support, but said they have been addressed during the beta period.</p>
<h2 id="one-piece-of-a-larger-platform">One piece of a larger platform</h2>
<p>Really, though, the whole point of Impala and its competitors is to turn Hadoop from a tool for batch analytics and mass storage <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into a platform that can handle nearly all of companies&#8217; data-processing needs</a>. In that regard, it appears we&#8217;re just getting started. Cloudera, MapR, Pivotal Greenplum and Hortonworks are already pushing their own products and projects, and Olson said &#8220;it&#8217;s absolutely our intent&#8221; to enhance Cloudera&#8217;s platform with even more open-source products &#8212; perhaps even more database technologies <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">a la HBase</a> &#8212; that will let users do more stuff with more types of data. Over time, this strategy could result in Hadoop displacing the current breed of databases and data warehouses and becoming the single data store atop of which users run whatever applications they so desire. For now, though, especially when it comes to Impala and the data warehouse incumbents, Olson is taking a measured approach. &#8220;The likelihood that we&#8217;re going to knock them off in the near term,&#8221; he said, &#8220;&#8230; it would be a tough fight to win.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=924008"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=924008" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300" medium="image">
			<media:title type="html">How Impala fits in</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">Hadapt&#039;s architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708" medium="image">
			<media:title type="html">The Pivotal HD and Hawq architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708" medium="image">
			<media:title type="html">The Stinger roadmap</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloudera who? Intel announces its own Hadoop distribution</title>
		<link>http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/</link>
		<comments>http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/#comments</comments>
		<pubDate>Tue, 26 Feb 2013 18:26:31 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Mapr]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=614504</guid>
		<description><![CDATA[Intel's getting into the open source software business with it's own version of Hadoop. It joins a host of startups as well as EMC Greenplum in building a distribution for big data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=614504&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Intel on Tuesday said it was getting into the software business with its own Hadoop distribution. The move is a potential blow for startups such as Cloudera, Hortonworks and MapR that are offering their own distributions of Hadoop, but it’s also an admission by the chip vendor that the opportunity in big data isn’t only to be found in selling hardware.</p>
<p>In a conference held in San Francisco, VP and General Manager of Intel’s Datacenter Software Division Boyd Davis explained Intel’s history in Hadoop that stretches back to 2009 and stressed that Intel is going to share some aspects of its Hadoop distribution, but not all. Intel has a distribution of Hadoop it has released in China, but today it’s bringing it to the United States Intel’s version of the Hadoop distribution uses Hadoop 2.0 and YARN, which is a cutting-edge version of  platform compared with what most Hadoop users have deployed thus far.</p>
<h2 id="why-intel-wants-to-push-its-ow">Why Intel wants to push its own version of Hadoop</h2>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/intelhadoophistory.jpg"><img alt="intelhadoophistory" src="http://gigaom2.files.wordpress.com/2013/02/intelhadoophistory.jpg?w=708&#038;h=400" width="708" height="400" class="aligncenter size-full wp-image-614518"></a></p>
<p>Boyd introduced partners such as and Cisco, which has tuned the Intel Hadoop distribution for its own servers. Intel also hosted a panel that included executives from SAP, Red Hat and Savvis to discuss the challenges of big data and the promise of Hadoop and big data.</p>
<p>Davis was up front about Intel’s rationale for releasing its own distribution, namely that it was worried about the fragmentation and possible uncertainty associated with current Hadoop distributions. That could be read as a dig against the many startups already offering Hadoop distributions, all of which are slightly different (of course, Intel’s will be slightly different, too). Like all of the existing players such as Cloudera and MapR, Intel will open source certain aspects of its distribution, but will also keep software to itself.</p>
<h2 id="inside-the-data-center-its-no-">Inside the data center, it’s no longer just web servers that matter</h2>
<p>For example, Davis stressed that Intel will not share its management and monitoring software, which could be highly valuable for enterprise customers. The Intel software could coordinate with Intel’s data center management software and make managing a variety of workloads easier. And hidden in that coordination might be one Intel’s aims in pushing its own version of Hadoop — the threat of ARM chips used in Hadoop clusters.</p>
<p>Dell, Calxeda and others are evaluating the use of lower-performance, <a href="http://gigaom.com/2012/10/24/dell-wants-to-tune-big-data-apps-for-arm-servers/">lower-power chips in Hadoop clusters</a>, a market <a href="http://gigaom.com/2011/06/13/big-data-on-micro-servers-you-bet/">Intel would hate to cede in the data center</a> as data grows and analytics becomes more important. To that end, Intel has also optimized its Hadoop distribution for solid-state drives, something that other Hadoop companies haven’t done so far.</p>
<p>When asked about Atom and the use of lower-performance processors for Hadoop, Davis noted that while people are using lower-end processors for Hadoop , but that those uses tend to have slower networking. Davis says that when you combine high-end processors with 10 gigabit Ethernet and Hadoop, customers get the performance that they want. </p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/intelhadoop.jpg"><img alt="intelhadoop" src="http://gigaom2.files.wordpress.com/2013/02/intelhadoop.jpg?w=708&#038;h=397" width="708" height="397" class="aligncenter size-full wp-image-614552"></a></p>
<p>So while Intel may tout stability and consistency as the reason for it’s decision to become a major player in the software market for big data, it’s also driven by the changes in the data center that threaten the grip Intel has on the hardware inside the data center. The cloud and big data has changed the workloads and hardware requirements for the data center and Intel is playing the long game in trying to release software that can be tuned to its chips.</p>
<h2 id="the-hadoop-drama-isnt-over-yet">The Hadoop drama isn’t over yet</h2>
<p>Intel isn’t the only big vendor touting its own homegrown version of Hadoop. On Monday, <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">EMC’s Greenplum division announced an entirely revamped version</a> of its Hadoop distribution that’s merged with it’s flagship analytic SQL database. These big companies have big existing businesses to protect and lots of resources to put into doing it. As my colleague Derrick Harris wrote on the EMC news:</p>
<blockquote id="quote-looking-past-his-com"><p>Looking past his competitive boasting, though, it’s easy to see [Greenplum's Scott] Yara’s greater point when you ask him what all this Hadoop talks means for the data warehouse business on which Greenplum was built. He points to the mainframe business that fell from its high perch decades ago but still drives billions a year in revenue. A single MPP database system is still faster on certain workloads than SQL on Hadoop, but that gap will close over time and “I do think the center of gravity will move toward HDFS,” he said.</p></blockquote>
<p>Hadoop is a juggernaut when it comes to big data. Intel is a juggernaut when it comes to data center infrastructure. Its decision to enter into the open source software market is a big one for the chip company, for the Hadoop ecosystem and for the myriad startups playing in this space. It’s a topic we’ll explore more during our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&amp;utm_content=shigginbotham">Structure Data conference in New York on March 20 and 21</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=614504&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=223852"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=223852" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=614504+cloudera-who-intel-announces-its-own-hadoop-distribution&utm_content=shigginbotham">Why service providers matter for the future of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/26/cloudera-who-intel-announces-its-own-hadoop-distribution/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/hadoop1-210x140.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hadoop1-210x140.jpg?w=150" medium="image">
			<media:title type="html">hadoop1-210x140</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/intelhadoophistory.jpg" medium="image">
			<media:title type="html">intelhadoophistory</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/intelhadoop.jpg" medium="image">
			<media:title type="html">intelhadoop</media:title>
		</media:content>
	</item>
		<item>
		<title>Hortonworks and Microsoft bring open-source Hadoop to Windows</title>
		<link>http://gigaom.com/2013/02/25/hortonworks-and-microsoft-bring-open-source-hadoop-to-windows/</link>
		<comments>http://gigaom.com/2013/02/25/hortonworks-and-microsoft-bring-open-source-hadoop-to-windows/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 13:00:47 +0000</pubDate>
		<dc:creator>Barb Darrow</dc:creator>
				<category><![CDATA[Excel]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Microsoft]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=613657</guid>
		<description><![CDATA[Hortonworks Data Platform for Windows, now in beta, brings Hadoop to Excel and SQL Server (and vice versa.)<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613657&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There’s probably no better way to open up big data to the masses than making it accessible and manipulatable — if that’s a word — via Microsoft Excel. And that ability gets closer to reality Monday with the beta release of Hortonworks Data Platform for Windows. The product of<a href="http://hortonworks.com/about-us/news/hortonworks-extends-hadoop-to-windows/"> a year-old collaboration between Hortonworks and Microsoft</a> is now downloadable.  General availability will come later in the second quarter, said Shawn Connolly, Hortonworks’ VP of corporate strategy,  in an interview.</p>
<p><a href="http://gigaom.com/2013/02/25/hortonworks-and-microsoft-bring-open-source-hadoop-to-windows/windowslogo/" rel="attachment wp-att-613667"><img alt="windowslogo" src="http://gigaom2.files.wordpress.com/2013/02/windowslogo.jpg?w=300&#038;h=74" width="300" height="74" class="alignleft size-medium wp-image-613667"></a></p>
<p>The combination should  make it easier to integrate data from SQL Server and Hadoop and to funnel all that into Excel for charting and pivoting and all the tasks Excel is good at, Connolly added.</p>
<p>He stressed that this means the very same Apache Hadoop distribution will run on Linux and Windows. An analogous Hortonworks Data Platform for Windows Azure is still in the works.</p>
<p>Microsoft opted to work with Hortonworks rather than to continue its own “Dryad” project, as GigaOM’s Derrick Harris <a href="http://gigaom.com/2012/02/28/microsofts-hadoop-play-is-shaping-up-and-it-includes-excel/">reported a year ago</a>. Those with long memories will recall this isn’t the first time that Microsoft relied on outside expertise for database work. The guts of early SQL Server came to the company via Sybase.</p>
<p>The intersection of structured SQL and  unstructured Hadoop universes is indeed a hotspot, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">as Derrick Harris reported last week, </a>with companies including Hadoop rivals Cloudera and EMC Greenplum all working that fertile terrain. That means Hortonworks/Microsoft face stiff competition. This topic, along with real-time data tracking, will be discussed at<a href="http://event.gigaom.com/structuredata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=613657+hortonworks-and-microsoft-bring-open-source-hadoop-to-windows&amp;utm_content=gigabarb"> GigaOM’s Structure Data conference </a>in New York on March 20-21.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613657&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=43456"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=43456" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=613657+hortonworks-and-microsoft-bring-open-source-hadoop-to-windows&utm_content=gigabarb">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=613657+hortonworks-and-microsoft-bring-open-source-hadoop-to-windows&utm_content=gigabarb">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=613657+hortonworks-and-microsoft-bring-open-source-hadoop-to-windows&utm_content=gigabarb">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=613657+hortonworks-and-microsoft-bring-open-source-hadoop-to-windows&utm_content=gigabarb">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/25/hortonworks-and-microsoft-bring-open-source-hadoop-to-windows/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/11/hortonworks-logo-e1320188026548.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/11/hortonworks-logo-e1320188026548.jpg?w=150" medium="image">
			<media:title type="html">Hortonworks logo</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4af03439988d64f816da72496325cb73?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigabarb</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/windowslogo.jpg?w=300" medium="image">
			<media:title type="html">windowslogo</media:title>
		</media:content>
	</item>
		<item>
		<title>A few stats, rumors and stories on Hadoop&#8217;s rapid growth</title>
		<link>http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/</link>
		<comments>http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/#comments</comments>
		<pubDate>Fri, 09 Nov 2012 23:32:24 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=582462</guid>
		<description><![CDATA[The largest players in the Hadoop market are already raising money and sky-high valuations, employing hundreds of people and, in some cases, looking at nine-figure revenues. If you're trying to get a sense of whether Hadoop is for real, these details might help.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582462&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s little hard data on the size of the largely private Hadoop market yet, but you can get a clue from looking at what&#8217;s going on inside Silicon Valley. The money changing hands and the sizes of the largest players in the space alone are enough to paint a telling picture of a market that&#8217;s growing fast in uncharted territory. I&#8217;ve collected some of the insights I&#8217;ve gleaned over the past few months to try and add some perspective.</p>
<p>Everything, of course, is relative and we might never see a Hadoop vendor reach the size of a database company such as Oracle with more than 100,000 employees and tens of billions in annual revenue. After all, Hadoop is a new technology for most companies, so it&#8217;s not really moving in on an already lucrative market and stealing budgetary dollars from incumbents. Further &#8212; and possibly more importantly &#8212; the core Hadoop technology is free and open source, meaning there are lots of unpaid downloads so money comes from services, support and large enterprises willing to buy software licenses for value-added products.</p>
<h2>Money</h2>
<p>Here&#8217;s a chart showing how much money Hadoop-based companies have raised thus far (although the grand total will likely rise by at least $10 million next week). Keep in mind, Cloudera only launched in 2009 and Hortonworks launched in June 2011. And these aren&#8217;t companies that merely bury Hadoop under an application or can connect their technologies to it &#8212; these are companies either selling Hadoop or applications designed specifically for it.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/hadoop-funding.jpg"><img  title="hadoop funding" alt="" src="http://gigaom2.files.wordpress.com/2012/11/hadoop-funding.jpg?w=708"   class="aligncenter size-full wp-image-583319" /></a><br />
(To view the original, interactive chart, <a href="http://public.tableausoftware.com/views/Hadoopfunding2/Sheet1?:embed=y">click here</a>.)</p>
<p>In terms of revenue, one might look to a May 2012 report by research from IDC <a href="http://gigaom.com/cloud/all-aboard-the-hadoop-money-train/">estimating the size of the Hadoop ecosystem to be around $77 million</a>, growing to $813 million by 2016. Those are both impressive numbers, but they might actually be short-changing reality. For one, as I noted at the time, the authors attributed almost no revenue to Amazon Web Services&#8217; Elastic MapReduce service, which is almost certainly generating at least a few million in revenue each year.</p>
<p>Speaking to me in June, Cloudera CEO Mike Olson also took issue with the number, claiming it didn&#8217;t even take Cloudera&#8217;s revenue into account &#8212; which seems entirely possible considering the business Cloudera is doing. I&#8217;ve heard from reliable sources that Cloudera is doing very well and is on track to do about $100 million in revenue this year, very possibly more. And as early as April 2011, Cloudera executives were <a href="http://gigaom.com/cloud/why-cloudera-isnt-sweating-the-hadoop-competition/">touting that software license revenue had already surpassed services revenue</a> (although it&#8217;s arguable whether that will, or even has to, remain the case).</p>
<p>More anecdotally, I&#8217;ve heard from several sources that Hortonworks has already declined at least one potentially appealing acquisition offer. That it wouldn&#8217;t sell isn&#8217;t surprising: sources say the company is valued at $225 million after its last round of funding and is looking to raise more money. And although it <a href="http://hortonworks.com/blog/announcing-general-availability-of-hortonworks-data-platform/">just released its first product in June</a>, the company has impressive and potentially lucrative partnerships in place with <a href="http://gigaom.com/cloud/microsofts-hadoop-play-is-shaping-up-and-it-includes-excel/">Microsoft</a>, <a href="http://gigaom.com/cloud/teradata-taps-hortonworks-to-improve-hadoop-story/">Teradata</a>, <a href="http://gigaom.com/cloud/rackspace-versus-amazon-the-big-data-edition/">Rackspace</a>, <a href="http://gigaom.com/cloud/hortonworks-teams-with-vmware-to-keep-hadoop-running/">VMware</a> and other large vendors.</p>
<p>MapR, the proprietary thorn in the sides of both Cloudera and Hortonworks, appears to be doing quite well, too. Vice President of Marketing Jack Norris <a href="http://gigaom.com/cloud/the-state-of-hadoop-strong-and-poised-to-explode/">told me in June that his company had higher license revenue than many would expect</a> and predicted that <a href="http://gigaom.com/cloud/amazon-taps-mapr-for-high-powered-elastic-mapreduce/">deals with Amazon Web Services</a> and Google Compute Engine would help the company become &#8220;the license revenue leader within the next quarter.&#8221;</p>
<p>Former Cloudera VP of Technology Solutions Omer Trajan, who just left to join HBase-centric startup WibiData, shared some insights with me from his days at Cloudera that seem to back up vendor confidence. He said most mature production clusters (excluding monster users such as Facebook) consist of about 200 nodes, and many double in size after the first year. That&#8217;s part of the reason Cloudera grew in size about 10x during the three years he was there.</p>
<p>&#8220;It has definitely been a rocket ship,&#8221; he said. &#8220;&#8230; You just strap in and hope you make it up.&#8221;</p>
<p>Interest is only picking up, too: &#8220;There are more people that have started big data projects in the past six months than have big big data projects running [in production],&#8221; Trajman said.</p>
<h2>People</h2>
<p>It&#8217;s probably not accurate to call companies such as Cloudera, Hortonworks and MapR startups anymore, and we might start to see signs of this shift in personnel moves. Here&#8217;s how big they are and expect to become:</p>
<ul>
<li><strong>Cloudera: </strong>More than 300 employees globally and growing, especially in the sales department.</li>
<li><strong>Hortonworks: </strong>145 employees as of late October and hiring a person per day, on average, through the end of 2012.</li>
<li><strong>MapR: </strong>More than 125 employees, mostly in technical and engineering positions; starting to build sales team and looks to more than double headcount in 2013.</li>
</ul>
<p>While Cloudera and Hortonworks, for example, are still young, nimble and agile enough <a href="http://gigaom.com/cloud/is-vmwares-brain-drain-a-sign-of-its-influence-or-of-its-demise/">to lure a fair amount of talent</a> from now-officially large enterprises such as VMware, their employees who joined on early and really love the startup life might not stick around.</p>
<p>Trajman&#8217;s new home, WibiData, is a fine example of this. It was launched last year by former Cloudera employees Christophe Bisciglia (who actually co-founded Cloudera) and Aaron Kimball <a href="http://gigaom.com/cloud/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/">to help companies build behavioral-analysis applications on top of Hadoop</a>.</p>
<p>(Maybe there&#8217;s a Cloudera mafia shaping up: WibiData&#8217;s officemates &#8212; <a href="http://gigaom.com/cloud/how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity/">MemCachier</a> and <a href="http://thanx.com/">Thanx</a> &#8212; both count former Cloudera employees as key members or founders of their teams, <a href="http://drawntoscale.com/about-us/">as does HBase-centric startup Drawn to Scale</a>.)</p>
<p>Trajman, who was one of the first couple dozen employees at Cloudera (and who previously joined Vertica at around the same stage in its growth) told me he likes the rush of getting in the the ground level of new technologies and helping companies do something really new. While he enjoyed establishing and implementing some the the core foundational use cases for Hadoop (e.g., ETL and data exploration) with Cloudera&#8217;s early customers, that&#8217;s still much of what Cloudera provides to customers because it&#8217;s so difficult to build higher-level and higher-value applications at the infrastructural level where Cloudera operates.</p>
<p>&#8220;For me, it was very personal in terms of the impact I wanted to have,&#8221; Trajman said. At WibiData, he can help users who have the infrastructure part resolved and now want to develop applications that make data analysis a core part of their businesses. Where there&#8217;s a focus on innovation, he said, that&#8217;s where the innovators go.</p>
<p>This isn&#8217;t a bad thing, it&#8217;s just a side effect of growth &#8212; and when employees stay and innovate in the Hadoop space, it just creates a bigger pie for everyone to share.</p>
<p>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-478987p1.html">Shutterstock user GuskovaNatalia</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582462&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=780"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=780" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_95592730.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_95592730.jpg?w=150" medium="image">
			<media:title type="html">Tall buildings</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/hadoop-funding.jpg" medium="image">
			<media:title type="html">hadoop funding</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloudera makes SQL a first-class citizen in Hadoop</title>
		<link>http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/</link>
		<comments>http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/#comments</comments>
		<pubDate>Wed, 24 Oct 2012 13:00:52 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytic database]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Drill]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=576626</guid>
		<description><![CDATA[Cloudera has joined the fray of Hadoop companies trying to turn the big data platform into an engine for exploring data interactively using standard SQL. As the biggest company in the space, its new technology called Impala could go a long way toward changing Hadoop's image.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=576626&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Not content to watch its competitors leave it in the dust, veteran big data startup Cloudera is fundamentally changing the face of its flagship Hadoop distribution into something much more appealing. The company has developed a real-time <a href="http://www.cloudera.com/content/cloudera/en/products/cloudera-enterprise-core/cloudera-enterprise-RTQ.html">SQL query engine called Impala</a> that will sit aside MapReduce as a native processing option within Cloudera&#8217;s version of Hadoop. Cloudera is biggest and most well-known Hadoop vendor around, so opening its platform up to the wide world of SQL-trained data analysts is a really big deal &#8212; even if Cloudera is a bit late to the SQL party.</p>
<h2>From batch processing to data interaction</h2>
<p>The business world regularly laments the circumstances that spurred Impala&#8217;s creation. I summed them up last week and again yesterday when reporting similar products <a href="http://gigaom.com/data/hadapt-does-big-love-for-big-data-and-hints-at-hadoops-future/">from startups Hadapt</a> and <a href="http://gigaom.com/data/platfora-shows-a-whole-new-way-to-do-business-intelligence-on-big-data/">Platfora</a>, but the gist is that although Hadoop is more scalable and more flexible than traditional data warehouses or analytic databases, it&#8217;s also slower, harder to learn and designed for batch processing an entire data set rather than interactively querying a data set. Until now, the common methods for querying Hadoop were to <a href="http://hive.apache.org/">use a custom-built language such as Hive</a>, or to transport data to a data warehouse from Hadoop and then analyze it using traditional business intelligence software.</p>
<p>However, Cloudera&#8217;s Cloud VP of Products Charles Zedlewski was quick to point out during a recent conversation that Impala isn&#8217;t a replacement for other BI tools, just a new data source into which they can connect. If anything, it&#8217;s a replacement for Hive, which Facebook built to bring data warehouse capabilities to Hadoop, but which wasn&#8217;t really developed for public consumption as a software product. For the sake of uniformity, Impala actually uses the same SQL set as Hive, but is on average 10 times faster thanks to its purpose-built query engine that foregoes reliance on MapReduce. Small queries, Zedlewski said, can run in less than a second.</p>
<p>Impala has been in the making for almost two years, and Cloudera &#8220;took a a lot of pains to stitch this really well in with the rest of the Hadoop stack,&#8221; Zedlewski said. Users still store data in the Hadoop Distributed File System of the HBase database, and they can still store whatever types of structured, semi-structured on unstructured data they please. Impala uses the same metadata as the other Hadoop components, the same drivers and &#8212; like almost everything else in the Hadoop world &#8212; is open source under the Apache Software Foundation license.</p>
<p>Unlike some other Hadoop startups, though, Cloudera isn&#8217;t interested in selling BI or other analytic applications. Impala (which is called Real-Time Query for customers who pay for support) is the execution engine, but it still relies on software from Cloudera partners such as <a href="http://gigaom.com/cloud/thanks-to-consumerization-its-ipo-season-in-analytics/">Tableau, QlikTech</a> and MicroStrategy in order to ask questions and visualize the results. &#8220;We&#8217;re sticking to our knitting as a platform vendor,&#8221; said Zedlewski, echoing a sentiment on which his boss, Cloudera CEO Mike Olson, <a href="http://gigaom.com/2012/03/21/cloudera-structure-data-2012/">has been bullish for years</a>.</p>
<p style="text-align: center;"><a href="http://gigaom2.files.wordpress.com/2012/10/impala.jpg"><img  title="impala" alt="" src="http://gigaom2.files.wordpress.com/2012/10/impala.jpg?w=708"   class="size-full wp-image-576688 aligncenter" /></a></p>
<h2>Different strokes move the world</h2>
<p>I can&#8217;t underscore enough how critical all of this innovation is for Hadoop, which in order to add substance to its unparalleled hype needed to become far more useful to far more users. But the sudden shift from Hadoop as a batch-processing engine built on MapReduce into an ad hoc SQL querying engine might leave industry analysts and even Hadoop users scratching their heads.</p>
<p>Cloudera, now with more than 300 employees and annual revenue rumored to be in hundreds of millions, is the 800-pound gorilla in the Hadoop market, and its implementation of Impala has to make it look even better for prospective customers. But Cloudera doesn&#8217;t have this space to itself. Assuming your goal is to use Hadoop as the platform for running SQL queries (as opposed to, for example, <a href="http://gigaom.com/data/metamarkets-open-sources-druid-its-in-memory-database/">using it for ETL before putting it in an in-memory system</a>), there are plenty of choices on the table. And everyone&#8217;s approach is different.</p>
<p>For starters, bitter distribution-level rival MapR announced in August that it&#8217;s <a href="http://gigaom.com/cloud/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">leading an open source project called Drill</a> that provides essentially the same functionality as Impala. MapR is <a href="http://gigaom.com/cloud/amazon-taps-mapr-for-high-powered-elastic-mapreduce/">getting a lot of love from Hadoop users right now</a>, and a future implementation of Drill into its product lineup would add even more legitimacy. Not wanting to cede the innovation edge to Cloudera of MapR, one has to suspect <a href="http://gigaom.com/cloud/hortonworks-teams-with-vmware-to-keep-hadoop-running/">Yahoo spinoff Hortonworks</a> will also get into the query engine game at some point. (We&#8217;ll leave the debate over whether the myriad different flavors of Hadoop constitute the beginning of a community fracture for another day.)</p>
<p>Like Cloudera, however, if MapR and Hortonworks decide to integrate query engines in their products, they&#8217;ll likely rely on application providers to deliver the user experience on top. For better or worse, that presently means reliance on legacy vendors until startups can get familiar with the source code and start building BI products designed to take advantage of the new capabilities. When asked about Impala as a technology for disrupting the traditional data warehouse market, Cloudera&#8217;s Zedlewski noted that existing products are often very good at what they do.</p>
<p>&#8220;I think it&#8217;s highly unlikely that something like Impala would really be considered an alternative of that,&#8221; he said. Those vendors don&#8217;t seem to think so either, as companies like Teradata and EMC Greenplum (e emc) are <a href="http://gigaom.com/cloud/emc-throws-lots-of-hardware-at-hadoop/">telling always-improving stories</a> about integrating their existing product lines with Hadoop.</p>
<div id="attachment_576706" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/10/2-drill_down-11.jpg"><img  title="2-drill_down-1" alt="" src="http://gigaom2.files.wordpress.com/2012/10/2-drill_down-11.jpg?w=300&#038;h=144" height="144" width="300" class="size-medium wp-image-576706" /></a><p class="wp-caption-text">Running a sentiment analysis in Tableau with Hadapt</p></div>
<p>On the other end of the spectrum are startups such as Hadapt, Platfora and <a href="http://gigaom.com/data/batten-down-the-analysts-its-a-big-data-bi-storm/">Birst</a>, which have built Hadoop-based query engines on their own, independent of loyalty to any particular Hadoop distribution. These companies have a lot of smart people on board, and their technologies are for real. Platfora CEO Ben Werther, in particular, makes no bones about his goal of unseating the BI incumbents with analytics applications built from the ground up to analyze big data stored in Hadoop.</p>
<p>Similar, although not necessarily competitive, technologies include <a href="http://gigaom.com/cloud/how-one-startup-wants-to-inject-hadoop-into-your-sql/">Spire (from Drawn to Scale)</a> and <a href="http://www.splicemachine.com">Splice Machine</a>. Both support some level of SQL querying and/or BI integration, although their real value comes in leveraging HBase to provide transactional capabilities that analytic databases aren&#8217;t designed to do.</p>
<p>Even though all these choices and approaches might add to the confusion over how to use Hadoop and which products to choose, the result is a net gain for Hadoop <a href="http://gigaom.com/cloud/the-state-of-hadoop-strong-and-poised-to-explode/">as the de facto platform for big data environments</a> even in the face of some alternative approaches. It has changed from a batch system to an interactive query engine pretty much overnight, so although he wouldn&#8217;t comment on the competition, Zedlewski wasn&#8217;t just blowing vendor smoke when told me, &#8220;I would argue Impala is a proof point that Hadoop as a platform has an ability to grow that no other data management platform has.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=576626&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=658210"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=658210" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576626+cloudera-makes-sql-a-first-class-citizen-in-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576626+cloudera-makes-sql-a-first-class-citizen-in-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576626+cloudera-makes-sql-a-first-class-citizen-in-hadoop&utm_content=dharrisstructure">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=576626+cloudera-makes-sql-a-first-class-citizen-in-hadoop&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/impala1-e1351083747709.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/impala1-e1351083747709.jpg?w=150" medium="image">
			<media:title type="html">impala</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/10/impala.jpg" medium="image">
			<media:title type="html">impala</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/10/2-drill_down-11.jpg?w=300" medium="image">
			<media:title type="html">2-drill_down-1</media:title>
		</media:content>
	</item>
		<item>
		<title>Scaling Hadoop clusters: the role of cluster management</title>
		<link>http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/</link>
		<comments>http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/#comments</comments>
		<pubDate>Mon, 23 Jul 2012 07:01:22 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Alcatel Lucent]]></category>
		<category><![CDATA[Ambari]]></category>
		<category><![CDATA[Apache Ambari]]></category>
		<category><![CDATA[Apache Software Foundation]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Chef]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[cluster management]]></category>
		<category><![CDATA[clusters]]></category>
		<category><![CDATA[Crowbar]]></category>
		<category><![CDATA[CSC]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[Dell Crowbar]]></category>
		<category><![CDATA[e-commerce]]></category>
		<category><![CDATA[emc-greenplum]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Ganglia]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop Common]]></category>
		<category><![CDATA[Hadoop Distributed File System]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Hortonworks Data Platform]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[lava]]></category>
		<category><![CDATA[loudera]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[platform]]></category>
		<category><![CDATA[Platform LSF]]></category>
		<category><![CDATA[platform-computing]]></category>
		<category><![CDATA[procter-gamble]]></category>
		<category><![CDATA[Puppet]]></category>
		<category><![CDATA[Puppet Labs]]></category>
		<category><![CDATA[Rocks]]></category>
		<category><![CDATA[social networking]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[stackiq]]></category>
		<category><![CDATA[StackIQ Enterprise Data]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=117860</guid>
		<description><![CDATA[Organizations are coping with the challenge of processing unprecedented volumes of data. However, the processes involved with using a large cluster to run applications like Hadoop are error-prone. So IT managers are turning to cluster-management solutions to automate tasks associated with cluster creation, management and maintenance. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545285&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545285&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=384850"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=384850" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=545285+scaling-hadoop-clusters-the-role-of-cluster-management&utm_content=cloudofdata">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=545285+scaling-hadoop-clusters-the-role-of-cluster-management&utm_content=cloudofdata">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=545285+scaling-hadoop-clusters-the-role-of-cluster-management&utm_content=cloudofdata">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=545285+scaling-hadoop-clusters-the-role-of-cluster-management&utm_content=cloudofdata">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/07/rockclimbing1.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/07/rockclimbing1.jpg?w=150" medium="image">
			<media:title type="html">rockclimbing1</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/7c1b4afa924d36a76027fe2be0543eeb?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">cloudofdata</media:title>
		</media:content>
	</item>
		<item>
		<title>Takeaways from the second quarter in cloud and data</title>
		<link>http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/</link>
		<comments>http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/#comments</comments>
		<pubDate>Tue, 17 Jul 2012 15:55:38 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/jomaitland/" rel="author">Jo Maitland</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Adara Networks]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[Battery Ventures]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Birst]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Cetas Software]]></category>
		<category><![CDATA[Cirro]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[converged infrastructure]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[Demand Media]]></category>
		<category><![CDATA[DynamicOps]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[eCircle]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Financial Times]]></category>
		<category><![CDATA[Flash storage]]></category>
		<category><![CDATA[GoGrid]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google BigQuery]]></category>
		<category><![CDATA[google buzz]]></category>
		<category><![CDATA[google compute engine]]></category>
		<category><![CDATA[google notebook]]></category>
		<category><![CDATA[google wave]]></category>
		<category><![CDATA[Google Web Accelerator]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop Summit]]></category>
		<category><![CDATA[Haoop]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[hypervisor]]></category>
		<category><![CDATA[I/O optimization]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[IDC]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Khosla Ventures]]></category>
		<category><![CDATA[LineRate Systems]]></category>
		<category><![CDATA[M&A]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[Metamarkets]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[microsoft-windows]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Open Networking Research Center]]></category>
		<category><![CDATA[Open Networking Summit]]></category>
		<category><![CDATA[OpenFlow]]></category>
		<category><![CDATA[Opera Solutions]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[PureSystems]]></category>
		<category><![CDATA[quest-software]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[redgiant-analytics]]></category>
		<category><![CDATA[RedGiantAnalytics]]></category>
		<category><![CDATA[SDN]]></category>
		<category><![CDATA[Serengeti]]></category>
		<category><![CDATA[SingleHop]]></category>
		<category><![CDATA[SoftLayer]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[software defined networking]]></category>
		<category><![CDATA[software defined networks]]></category>
		<category><![CDATA[solid state disk]]></category>
		<category><![CDATA[Tealeaf Technology]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Terascala]]></category>
		<category><![CDATA[Terradata]]></category>
		<category><![CDATA[tier-3]]></category>
		<category><![CDATA[Truviso]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[Varicent Software]]></category>
		<category><![CDATA[VCE Company]]></category>
		<category><![CDATA[Verizon]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[virtustream]]></category>
		<category><![CDATA[vivisimo]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[Windows Azure]]></category>
		<category><![CDATA[XtremeIO]]></category>
		<category><![CDATA[XtremIO]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=116565</guid>
		<description><![CDATA[In cloud and big data, the second quarter of 2012 featured several high-profile deals and product launches that could reshape the marketplace for everyone. Google and Microsoft launched Infrastructure-as-a-Service offerings, software-defined networking took off, and all eyes stayed fixed on the continuing promise of data analytics.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=543550&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In cloud and big data, the second quarter of 2012 featured several high-profile deals and product launches that could reshape the marketplace for everyone. Google and Microsoft launched Infrastructure-as-a-Service offerings, software-defined networking took off, and all eyes stayed fixed on the continuing promise of data analytics. This quarterly wrap-up discusses these milestones, and provides a near-term outlook for trends, technologies and companies to watch in the next 18 to 24 months.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=543550&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=697717"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=697717" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Cloud computing infrastructure: 2012 and beyond</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" medium="image">
			<media:title type="html">gigaompromasterimagecloud</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>The state of Hadoop: Strong and poised to explode</title>
		<link>http://gigaom.com/2012/06/15/the-state-of-hadoop-strong-and-poised-to-explode/</link>
		<comments>http://gigaom.com/2012/06/15/the-state-of-hadoop-strong-and-poised-to-explode/#comments</comments>
		<pubDate>Fri, 15 Jun 2012 23:19:53 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[emc-greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[Microsoft]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=532030</guid>
		<description><![CDATA[Now six years old, the Apache Hadoop platform for storing and processing huge amounts of data, perhaps the catalyst of the current big data movement, appears ready for its closeup. According to the companies leading the Hadoop charge, they're already beating away customers with a stick.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=532030&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg"><img  title="shutterstock_60414424" src="http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg?w=300&#038;h=199" alt="" width="300" height="199" class="alignleft size-medium wp-image-533065" /></a>Now six years old, the <a href="http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/">Apache Hadoop</a> platform for storing and processing huge amounts of data &#8212; perhaps the catalyst of the current big data movement &#8212; appears ready for its closeup. According to the companies leading the Hadoop charge, they&#8217;re already beating away customers with a stick. Continual improvements to make Hadoop consumable by mainstream business users and applications are only going to make things better.</p>
<p>As with any new technology, the big question surrounding Hadoop as a viable market is whether enterprises will adopt it. The answer seems to be a resounding &#8220;Yes.&#8221; Already, Hortonworks CEO Rob Bearden told me, &#8220;We are seeing Hadoop in almost every Fortune 500 in either a proof of concept or a pilot.&#8221; Bearden doesn&#8217;t mean that his company has accounts with everyone in the Fortune 500, though, just that the majority of those companies are looking into Hadoop.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/06/olsons-truc.jpg"><img  title="olsons truc" src="http://gigaom2.files.wordpress.com/2012/06/olsons-truc.jpg?w=708" alt=""   class="alignright size-full wp-image-533067" /></a>Cloudera, the first company to commercialize Hadoop (<a href="http://gigaom.com/2009/06/01/cloudera-a-hadoop-focused-startup-gets-6m-in-new-funding/">all the way back in 2008</a>), certainly has a lot of those premier accounts. Cloudera CEO Mike Olson says interest in his company&#8217;s software and services is &#8220;absolutely skyrocketing&#8221; and it has many more deals in the pipeline that it ever has before. That&#8217;s on top of the big deployments (such as those at Nokia, Samsung and Chevron) his company already has in place.</p>
<p>Even MapR &#8212; a Hadoop startup that <a href="http://gigaom.com/cloud/meet-mapr-a-competitor-to-hadoop-leader-cloudera/">hasn&#8217;t been in the public eye as long as Cloudera</a> and doesn&#8217;t have Yahoo roots to tout <a href="http://gigaom.com/cloud/exclusive-yahoo-launching-hadoop-spinoff-this-week/">like Hortonworks does</a> &#8212; claims to be killing it. It has flagship customers such as comScore and Boeing, as well <a href="http://gigaom.com/cloud/startup-mapr-underpins-emcs-hadoop-effort/">as an OEM deal with Hadoop frenemy EMC Greenplum</a> that MapR VP of Marketing Jack Norris told me is driving a lot of deals. EMC resells MapR&#8217;s M5 Hadoop distribution under the EMC Greenplum Hadoop MR Edition moniker.</p>
<p>That MapR is able to sell licenses for Hadoop &#8212; something most of its competitors give away (even MapR has a free version called M3) &#8212; says a lot about demand for Hadoop. &#8220;My guess is that we&#8217;ll be the license revenue leader within the next quarter,&#8221; Norris said. &#8220;We have higher M5 licensing and use than you would expect.&#8221;</p>
<h2>But is Hadoop a bubble?</h2>
<p>However, despite all the enterprise interest in Hadoop, some critics worry that it&#8217;s an overhyped technology that is bound to disappoint companies that put too much stock in it.</p>
<p>It&#8217;s easy to see where someone would get that idea considering how &#8220;Hadoop&#8221; rolls off the tongue as soon as a discussion turns to big data or analytics. And then <a href="http://www.channelregister.co.uk/2011/11/11/hadoop_funding_wars/">there&#8217;s the money</a>: Hadoop distribution vendors such as Cloudera, Hortonworks and MapR <a href="http://gigaom.com/cloud/with-40m-for-cloudera-how-much-is-hadoop-worth/">raise venture capital in increments of $10 million</a>, and it seems as if every startup claiming some connection to Hadoop is able to raise at least a few million.</p>
<p>It all seems too good to be true. We&#8217;ve seen this story play out before with technologies that never really caught on (such as virtual desktops) and <a href="http://ovum.com/2011/11/21/hadoops-growing-pains/">industries that collapsed</a> despite the promise of a technological savior (think Java and the dot.com era). With many of those Hadoop installations still in the pilot phase, there&#8217;s still time for the companies testing it out to back away when it doesn&#8217;t pan out.</p>
<h2>Nope. Here&#8217;s why.</h2>
<p>But that skepticism appears misplaced when it comes to Hadoop, which has everything going in its favor right now. At the foundational level, where even Hortonworks&#8217; Bearden acknowledges Hadoop &#8220;is not [yet] 100 percent intuitive,&#8221; the story is getting better. As it gets easier to deploy and manage, IT departments tasked with running Hadoop clusters are going to put up less of a fight.</p>
<div id="attachment_533066" class="wp-caption alignleft" style="width: 190px"><a href="http://gigaom2.files.wordpress.com/2012/06/timthumb-php.jpg"><img  title="timthumb.php" src="http://gigaom2.files.wordpress.com/2012/06/timthumb-php.jpg?w=708" alt=""   class="size-full wp-image-533066" /></a><p class="wp-caption-text">Rob Bearden</p></div>
<p>And it is getting easier. Reference architectures? <a href="http://gigaom.com/cloud/ciscos-servers-now-tuned-for-hadoop/">Check</a>. Cluster management software? <a href="http://gigaom.com/cloud/the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster/">Check</a>. Preconfigured software-hardware stacks? <a href="http://gigaom.com/cloud/cloudera-brings-the-hadoop-to-oracles-big-data-appliance/">Check</a>. &#8220;We have to evolve Hadoop to become an enterprise data platform,&#8221; Bearden said, and all these things &#8212; along with buy-in from the world&#8217;s largest IT companies &#8212; will help make that happen.</p>
<p>Oh, and now VMware <a href="http://gigaom.com/cloud/vmware-aims-for-hadoop-on-vms-with-serengeti-project/">wants to make Hadoop run on virtual machines</a> to help make it more resource-efficient and dynamic. For startups, Hadoop is <a href="http://gigaom.com/cloud/exclusive-the-brains-behind-hive-launch-on-demand-hadoop-service/">available in myriad formats as a cloud service</a>, which means teams with small IT teams or budgets don&#8217;t need to own or manage a cluster at all.</p>
<p>Actually running analytics jobs is also getting a lot easier, especially for companies that want to extend their current practices into bigger, badder datasets. Basic analytic functions <a href="http://gigaom.com/cloud/is-2013-the-year-hadoop-uptake-turns-into-a-tornado/">are becoming child&#8217;s play thanks to Hadoop-focused startups</a> such as Karmasphere, Datameer and Platfora. Nearly every analytic database and business intelligence product on the planet also now connects with Hadoop. <a href="http://gigaom.com/cloud/microsofts-hadoop-play-is-shaping-up-and-it-includes-excel/">So does Microsoft Excel</a>, an integration Cloudera&#8217;s Olson said &#8220;is going to make the the biggest change in Hadoop [adoption], generally.&#8221;</p>
<h2>But it&#8217;s really about apps</h2>
<p>Probably the most exciting sign of Hadoop&#8217;s prospects, though, is the number of entirely new applications it&#8217;s enabling for companies creative enough to spot the opportunities. I spent two days at <a href="http://hadoopsummit.org/">Hadoop Summit</a> this week, and while talks by Twitter and Facebook stole the show, I thought some of the most interesting (in theory, if not in practice) we&#8217;re around using Hadoop to do things like improve online education or <a href="http://yapmap.com/">search online forums</a> that often house the only available information on super-niche topics.</p>
<p>We&#8217;ve <a href="http://gigaom.com/cloud/10-ways-companies-are-using-hadoop-to-do-more-than-serve-ads/">covered many more on GigaOM through the years</a>, ranging from better targeted advertising to better customer service to more-intelligent health care. And then there are tools such as Spire, <a href="http://gigaom.com/cloud/drawn-to-scale-raises-money-to-make-sql-big-data-ready/">a high-performance SQL database from startup Drawn to Scale</a> that&#8217;s based on <a href="http://hbase.apache.org/">HBase</a>, an open source database built atop the Hadoop Distributed File System.</p>
<p><iframe src="http://player.vimeo.com/video/43985564?title=0&amp;byline=0&amp;portrait=0" frameborder="0" width="400" height="300"></iframe></p>
<p>Olson thinks the availability of applications &#8212; especially those built by sofware vendors and targeting specific uses within specific industries &#8212; will spur a flood of Hadoop adoption. For example, he said, an application for financial risk analysis &#8220;will be very easy to sell into hedge funds.&#8221; And as Olson <a href="http://gigaom.com/2012/03/21/cloudera-structure-data-2012/">told entrepreneurs considering building Hadoop-based apps</a> at our Structure: Data conference in March, “Call me, I’ll connect you with funding. The money is out there.”</p>
<p>I have no reason to doubt it is. Hadoop Summit brought together about 2,200 people of all stripes that are working with this six-year-old technology in some manner. Earlier in June, Cloudera put on a product launch party in San Francisco that rivaled any IT event I&#8217;ve ever seen in terms of sheer swankiness. If Hadoop isn&#8217;t poised to become a multi-billion-dollar market very soon, it&#8217;s putting on one heck of a facade.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-71330p1.html">Shutterstock user Carlos Caetano</a>; Mike Olson photo by <a href="http://pinarozger.com">Pinar Ozger</a>.<br />
</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=532030&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=171132"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=171132" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532030+the-state-of-hadoop-strong-and-poised-to-explode&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532030+the-state-of-hadoop-strong-and-poised-to-explode&utm_content=dharrisstructure">Scaling Hadoop clusters: the role of cluster management</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532030+the-state-of-hadoop-strong-and-poised-to-explode&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532030+the-state-of-hadoop-strong-and-poised-to-explode&utm_content=dharrisstructure">Why service providers matter for the future of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/06/15/the-state-of-hadoop-strong-and-poised-to-explode/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_60414424</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg?w=300" medium="image">
			<media:title type="html">shutterstock_60414424</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/olsons-truc.jpg" medium="image">
			<media:title type="html">olsons truc</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/timthumb-php.jpg" medium="image">
			<media:title type="html">timthumb.php</media:title>
		</media:content>
	</item>
		<item>
		<title>Hortonworks releases Hadoop distro, teams with VMware</title>
		<link>http://gigaom.com/2012/06/12/hortonworks-teams-with-vmware-to-keep-hadoop-running/</link>
		<comments>http://gigaom.com/2012/06/12/hortonworks-teams-with-vmware-to-keep-hadoop-running/#comments</comments>
		<pubDate>Tue, 12 Jun 2012 12:00:51 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Apache Software Foundation]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=531399</guid>
		<description><![CDATA[One year after launching into the Hadoop market with much anticipation, Yahoo spinoff Hortonworks finally has a product available. The company announced version 1.0 of its flagship Hortonworks Data Platform on Tuesday, as well as a High Availability version designed with new partner VMware.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=531399&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/06/shutterstock_81391894.jpg"><img  title="shutterstock_81391894" src="http://gigaom2.files.wordpress.com/2012/06/shutterstock_81391894.jpg?w=300&#038;h=225" alt="" width="300" height="225" class="alignleft size-medium wp-image-531466" /></a><span style="text-decoration: underline;"></span>Updated:One year after <a href="http://gigaom.com/cloud/exclusive-yahoo-launching-hadoop-spinoff-this-week/">launching into the Hadoop market with much anticipation</a>, Yahoo spinoff Hortonworks finally has a product available. The company announced version 1.0 of its flagship <a href="http://hortonworks.com/products/hortonworksdataplatform/">Hortonworks Data Platform</a> on Tuesday, as well as a High Availability <del>version</del> architecture designed with new partner VMware. Reasonable minds can disagree on whose distribution of the Apache Hadoop data-processing platform is the best, but Hortonworks needed to get on the board to be part of the discussion.</p>
<p>In terms of product, the Hortonworks Data Platform is about what was advertised when the company first <a href="http://gigaom.com/cloud/yahoo-spinoff-shakes-up-hadoop-market-with-new-distro/">unveiled it in November</a>. The major difference from other commercial distributions, such as Cloudera, EMC Greenplum and MapR is that Hortonwork uses Apache Ambari to configure and manage clusters; HCatalog as a metadata service to connect with relational database products; and incorporates Talend&#8217;s Open Studio as a tool for graphically integrating datasets and composing workflows.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/06/hor_platdiag_noicons.jpg"><img  title="HOR_PlatDiag_NoIcons" src="http://gigaom2.files.wordpress.com/2012/06/hor_platdiag_noicons.jpg?w=604&#038;h=360" alt="" width="604" height="360" class="aligncenter size-large wp-image-531448" /></a></p>
<p>The Hortonworks Data Platform HA <del>distribution</del> architecture, however, is a bit more intriguing. Technically, it works by running important Hadoop services such as NameNode and JobTracker<del> and Oozie</del> on virtual machines. If the physical server or VM on which a service is running fails, the product automatically moves the service to another box.</p>
<p>The other commercial Hadoop distributions all offer fault tolerance &#8212; at least for the Hadoop Distributed File System (which is where the NameNode resides) &#8212; but they rely on different approaches to get there. Cloudera, for example, is built on the <a href="http://hadoop.apache.org/common/docs/current/">new Hadoop 2.0 version</a> (Hortonworks uses the tried-and-true Hadoop 1.0), while MapR <a href="http://gigaom.com/cloud/investors-make-20m-bet-on-mapr-to-win-hadoop-war/">uses a proprietary file system</a>. Hortonworks, as is its business plan, will contribute the code for its HA <del>version</del> solution back into the Apache Hadoop project.</p>
<p>&#8220;The next stage,&#8221; Hortonworks Chief Products Officer Ari Zilka told me, &#8220;is to run the whole cluster in a virtual environment.&#8221; Doing that without sacrificing processing performance will be the trick.</p>
<p><del>Where the real intrigue comes is that the Hortonworks Data Platform HA edition will be available through VMware.</del> The work with VMware represents the furtherance of a unique partner strategy in which Hortonworks works closely with technology partners such as VMware, <a href="http://gigaom.com/cloud/microsofts-hadoop-play-is-shaping-up-and-it-includes-excel/">Microsoft</a> and <a href="http://gigaom.com/cloud/teradata-taps-hortonworks-to-improve-hadoop-story/">Teradata</a> to develop products that leverage Hadoop while being more than mere integrations. Cloudera has more than 200 partners, for example, but at least some of Hortonworks&#8217; partnerships appear much tighter.</p>
<p>Finally, at least, the market for Hadoop distributions appears complete. There are five rather distinct offerings from five rather distinct providers &#8212; <a href="http://www.cloudera.com/company/press-center/releases/cloudera-introduces-fourth-generation-of-its-big-data-platform-to-drive-ease-of-use-integration-and-adoption-of-apache-hadoop-for-the-enterprise/">Cloudera</a>, <a href="http://gigaom.com/cloud/emc-delivers-on-isilon-hadoop-bundle/">EMC Greenplum</a>, Hortonworks, <a href="http://www-01.ibm.com/software/data/infosphere/biginsights/">IBM</a> and MapR (six if you include Amazon&#8217;s Elastic MapReduce cloud service) &#8212; and each has its merits. We&#8217;ll see whose technologies and business models win the day as the enterprise world gets set to start investing in Hadoop in a major way.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-585700p1.html">Shutterstock user Colin Edwards Photography</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=531399&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=473824"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=473824" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=531399+hortonworks-teams-with-vmware-to-keep-hadoop-running&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=531399+hortonworks-teams-with-vmware-to-keep-hadoop-running&utm_content=dharrisstructure">Scaling Hadoop clusters: the role of cluster management</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=531399+hortonworks-teams-with-vmware-to-keep-hadoop-running&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=531399+hortonworks-teams-with-vmware-to-keep-hadoop-running&utm_content=dharrisstructure">Why service providers matter for the future of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/06/12/hortonworks-teams-with-vmware-to-keep-hadoop-running/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_81391894.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_81391894.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_81391894</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_81391894.jpg?w=300" medium="image">
			<media:title type="html">shutterstock_81391894</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/hor_platdiag_noicons.jpg?w=604" medium="image">
			<media:title type="html">HOR_PlatDiag_NoIcons</media:title>
		</media:content>
	</item>
		<item>
		<title>The unsexy side of big data: 5 tools to manage your Hadoop cluster</title>
		<link>http://gigaom.com/2012/05/18/the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster/</link>
		<comments>http://gigaom.com/2012/05/18/the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster/#comments</comments>
		<pubDate>Fri, 18 May 2012 20:20:29 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Apache Ambari]]></category>
		<category><![CDATA[Apache Mesos]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[clusters]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[platform-computing]]></category>
		<category><![CDATA[stackiq]]></category>
		<category><![CDATA[zettaset]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=523209</guid>
		<description><![CDATA[It's neither easy nor glamorous -- data scientists get all the love -- but making sure your Hadoop cluster is properly configured and applications are running optimally is necessary, especially as applications move into production. Here are five tools to help you do it.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=523209&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/05/shutterstock_69852472.jpg"><img  title="shutterstock_69852472" src="http://gigaom2.files.wordpress.com/2012/05/shutterstock_69852472.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="alignleft size-medium wp-image-523261" /></a>Before you can get into the fun part of actually processing and analyzing big data with Hadoop, you have to configure, deploy and manage your cluster. It&#8217;s neither easy nor glamorous &#8212; data scientists get all the love &#8212; but it is necessary. Here are five tools (not from commercial distribution providers such as Cloudera or MapR) to help you do it.</p>
<p><strong>Apache Ambari</strong></p>
<p><a href="http://incubator.apache.org/ambari/">Apache Ambari</a> is an open source project for monitoring, administration and lifecycle management for Hadoop. It&#8217;s also the project that Hortonworks has <a href="http://gigaom.com/cloud/yahoo-spinoff-shakes-up-hadoop-market-with-new-distro/">chosen as the management component for the Hortonworks Data Platform</a>. Ambari works with Hadoop MapReduce, HDFS, HBase, Pig, Hive, HCatalog and Zookeeper.</p>
<p><strong>Apache Mesos</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/05/mesos-copy.jpg"><img  title="mesos copy" src="http://gigaom2.files.wordpress.com/2012/05/mesos-copy.jpg?w=300&#038;h=136" alt="" width="300" height="136" class="alignright size-medium wp-image-523251" /></a><a href="http://incubator.apache.org/mesos/">Apache Mesos</a> is a cluster manager that lets users run multiple Hadoop jobs, or other high-performance applications, on the same cluster at the same time. <a href="http://engineering.twitter.com/2012/05/incubating-apache-mesos.html">According to Twitter Open Source Manager Chris Aniszczyk</a>, Mesos &#8220;runs on hundreds of production machines and makes it easier to execute jobs that do everything from running services to handling our analytics workload.&#8221;</p>
<p><strong>Platform MapReduce</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/05/mapreduce-arch-copy.jpg"><img  title="mapreduce-arch copy" src="http://gigaom2.files.wordpress.com/2012/05/mapreduce-arch-copy.jpg?w=300&#038;h=215" alt="" width="300" height="215" class="alignright size-medium wp-image-523252" /></a><a href="www.platform.com/Products/MapReduce/overview">Platform MapReduce</a> is high-performance computing expert Platform Computing&#8217;s entre into the big data space. It&#8217;s a runtime environment that supports a variety of MapReduce applications and file systems, not just those directly associated with Hadoop, and is <a href="http://gigaom.com/cloud/platform-computing-extends-hpc-reach-into-mapreduce/">tuned for enterprise-class performance and reliability</a>. Platform, now part of IBM, built a respectable business managing clusters for large financial services institutions.</p>
<p><strong>StackIQ Rocks+ Big Data<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/05/stackiq-marchitecture-copy.jpg"><img  title="StackIQ-Marchitecture copy" src="http://gigaom2.files.wordpress.com/2012/05/stackiq-marchitecture-copy.jpg?w=300&#038;h=226" alt="" width="300" height="226" class="alignright size-medium wp-image-523253" /></a><a href="http://www.stackiq.com/big-data/">StackIQ Rock+ Big Data</a> is a commercial distribution of the Rocks cluster management software that the company has beefed up to also support Apache Hadoop. Rocks+ supports the Apache, Cloudera, Hortonworks and MapR distributions, and handles the entire process from configuring bare metal servers to managing an operational Hadoop cluster.</p>
<p><strong>Zettaset Orchestrator<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/05/zettaset.jpg"><img  title="zettaset" src="http://gigaom2.files.wordpress.com/2012/05/zettaset.jpg?w=280&#038;h=300" alt="" width="280" height="300" class="alignright size-medium wp-image-523255" /></a><a href="http://www.zettaset.com/platform.php">Zettaset Orchestrator</a> is an end-to-end Hadoop management product that supports multiple Hadoop distributions. Zettaset touts Orchestrator&#8217;s UI-based experience and its ability to handle what the company calls MAAPS &#8212; management, availability, automation, provisioning and security. <a href="http://gigaom.com/cloud/how-hadoop-can-help-keep-your-money-in-the-bank/">At least one large company, Zions Bancorporation, is a Zettaset customer</a>.</p>
<p>If there are more Hadoop management tools floating around, please let me know in the comments.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-54809p1.html">Shutterstock user .shock.</a></em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=523209&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=813085"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=813085" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=523209+the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=523209+the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster&utm_content=dharrisstructure">Scaling Hadoop clusters: the role of cluster management</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=523209+the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=523209+the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/05/18/the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/shutterstock_69852472.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/shutterstock_69852472.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_69852472</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/shutterstock_69852472.jpg?w=300" medium="image">
			<media:title type="html">shutterstock_69852472</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/mesos-copy.jpg?w=300" medium="image">
			<media:title type="html">mesos copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/mapreduce-arch-copy.jpg?w=300" medium="image">
			<media:title type="html">mapreduce-arch copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/stackiq-marchitecture-copy.jpg?w=300" medium="image">
			<media:title type="html">StackIQ-Marchitecture copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/05/zettaset.jpg?w=280" medium="image">
			<media:title type="html">zettaset</media:title>
		</media:content>
	</item>
	</channel>
</rss>
