<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; SQL</title>
	<atom:link href="http://gigaom.com/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 17:55:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; SQL</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Look, IBM is doing SQL on Hadoop, too</title>
		<link>http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/</link>
		<comments>http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/#comments</comments>
		<pubDate>Mon, 06 May 2013 17:37:41 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642523</guid>
		<description><![CDATA[IBM's entrant in the SQL-on-Hadoop competition has been flying under the radar, but is available as a technology preview. Called Big SQL, it's a big deal if IBM wants to be a major player in the Hadoop space.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642523&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Maybe this is just news to me, but IBM has a SQL-on-Hadoop product in the works called Big SQL. The company <a href="https://www.ibm.com/developerworks/community/blogs/SusanVisser/entry/introducing_the_ibm_big_sql_technology_preview1?lang=en">announced the technology preview version in March</a> (well under my radar and, from what I&#8217;ve seen, nearly everyone else&#8217;s radar), and is offering up a cloud-based demo environment for a select group of early users.</p>
<p>As a refresher, the big difference between SQL on Hadoop and the Hadoop connectors that were popular a couple years ago is that SQL-on-Hadoop products query the data where it resides &#8212; in HDFS or HBase &#8212; rather than pulling it into a relational database environment to analyze it. We have been <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">talking for months about the emergence of a large SQL-on-Hadoop market</a>, but IBM&#8217;s name was conspicuously absent from that discussion. The company has Hadoop software called BigInsights and lots of SQL expertise, so it only made sense that IBM would get into the game at some point.</p>
<p>Details on Big SQL are still pretty sparse save for a few high-level blog posts and an instructional video (embedded below), but it looks to take the standard approach, <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">as Cloudera is doing with Impala</a>, of enabling access through traditional tools via JDBC and ODBC drivers.</p>
<p>Ultimately, I think the advent of big data will <a href="http://gigaom.com/2013/05/01/precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data/">enable some new types of querying techniques</a> quite a bit different than the SQL queries we&#8217;ve come to know and love over the past couple decades. But SQL is still the language du jour and might never go away, so there&#8217;s a lot of value to be had if people can put their SQL skills to work on data stored inside Hadoop or other environments, and if companies can work toward a nirvana <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">where all the data is stored in a single place</a> rather than across database environments.</p>
<p>That IBM got this message and got into the game isn&#8217;t surprising at all, but it is important. Lots of large companies buy IBM&#8217;s software.  If it wants them to follow it into the world of big data and Hadoop, it has to give them the tools they need to use it.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/DCWig4-h1F4?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642523&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=7221"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=7221" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" medium="image">
			<media:title type="html">sql statement</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>With Impala now GA, Cloudera&#8217;s CEO sizes up the SQL-on-Hadoop market</title>
		<link>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/</link>
		<comments>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640777</guid>
		<description><![CDATA[Cloudera's Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop. And if CEO Mike Olson&#8217;s comments are any indication, we&#8217;re in for a long ride of competitive jockeying and oneupmanship as Cloudera and its peers go all Microsoft or Google and create myriad new data-processing engines to turn their Hadoop distributions into bona fide platforms.</p>
<p>Launched as a private beta in May 2012 and <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">made public in October</a>, Impala is Cloudera&#8217;s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It&#8217;s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.</p>
<div id="attachment_640848" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg"><img  alt="How Impala fits in" src="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300&#038;h=257" width="300" height="257" class="size-medium wp-image-640848" /></a><p class="wp-caption-text">How Impala fits in</p></div>
<p>Impala actually uses the same &#8220;nearly ANSI&#8221; version of SQL as does current standard bearer Hive, but that technology (created by Facebook in 2009 as a data warehouse layer for Hadoop) doesn&#8217;t run nearly fast enough to sate many users&#8217; desire for interactive analytics. This is because Hive transforms SQL queries into MapReduce jobs, meaning every one is processed against the entire corpus of data in the Hadoop Distributed File System.</p>
<h2 id="sizing-up-the-competition">Sizing up the competition</h2>
<p>Only Cloudera isn&#8217;t the first to have the idea, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">nor is it alone in trying to sell interactive SQL on Hadoop</a>. The idea was <a href="http://gigaom.com/2011/10/21/hadapt-raises-9-5m-for-hadoop-data-warehouse/">first commercialized by Boston-based startup Hadapt</a> in 2011, and is now being pushed by numerous startups and larger Hadoop players. Among them: Pivotal (formerly EMC) Greenplum, MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), Hortonworks (with <a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger</a>), Drawn to Scale, Splice Machine, Jethro Data and Citus Data.</p>
<div id="attachment_640858" class="wp-caption aligncenter" style="width: 600px"><a href="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg"><img  alt="Hadapt's architecture" src="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg?w=708"   class="size-full wp-image-640858" /></a><p class="wp-caption-text">Hadapt&#8217;s architecture</p></div>
<p>But Cloudera is arguably the biggest name pushing SQL on Hadoop, and CEO Mike Olson thinks Impala stands out for several reasons &#8212; not the least of which is that it exists as a product. &#8220;Nobody else is shipping production-grade SQL query support on Hadoop,&#8221; he told me during a recent call. &#8220;At least not in open source.&#8221; He seems content to let the startups do their things, instead focusing his attention on Cloudera&#8217;s big three Hadoop-distribution competitors in Pivotal, MapR and Hortonworks. Greenplum and Pivotal SVP Scott Yara <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">was full of confidence &#8212; and R&amp;D budget</a>&#8211; when the company announced the Pivotal HD distribution and HAWQ technology in February, but Olson claims the approach requires a siloed DBMS within HDFS and is a &#8220;rearguard defensive strategy&#8221; to protect the company&#8217;s sunk costs in its database technology.</p>
<div id="attachment_615210" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg"><img  alt="The Pivotal HD and Hawq architecture" src="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708&#038;h=387" width="708" height="387" class="size-large wp-image-615210" /></a><p class="wp-caption-text">The Pivotal HD and Hawq architecture</p></div>
<p>As for Hortonworks, Olson questions the wisdom of its Stinger initiative to boost Hive&#8217;s speed, noting that &#8220;Hive never got good while it was running standalone on MapReduce.&#8221; Hortonworks also <a href="http://gigaom.com/2013/04/15/teradata-to-connect-hadoop-and-data-warehouses-roll-out-new-appliance/">partners with vendors such as Teradata</a> to let their platforms access Hadoop data in its native format, but those approaches still require sending data over the network. &#8220;It&#8217;s not the way you would build it if you woke up in the 2000s and were building this anew,&#8221; Olson said.</p>
<div id="attachment_640854" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png"><img  alt="The Stinger roadmap" src="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708&#038;h=558" width="708" height="558" class="size-large wp-image-640854" /></a><p class="wp-caption-text">The Stinger roadmap</p></div>
<p>Olson acknowledged that the MapR-led Apache Drill project is cut from the same cloth as Impala (that is, being a Google Dremel clone designed specifically for Hadoop), but &#8220;the difference is we&#8217;re shipping code.&#8221; Being generally available and ready for production workloads means Cloudera can lock down users and market share before many even have a chance to experiment with Drill. He all but dismissed questions over the readiness of Impala, spurred by rumblings in the Hadoop space that Cloudera rushed it into public beta in order to get on the scoreboard against more fully baked offerings.</p>
<p>&#8220;I don&#8217;t feel we&#8217;re under the gun competitively to pull it out of beta because no one else has product in the market,&#8221; Olson said. &#8220;I have no problems &#8230; calling this GA quality.&#8221; He did, however, acknowledge that Impala is shipping with a &#8220;minium viable feature set&#8221; that the company has plans to build on in the near future. Impala Senior Product Manager Justin Erickson noted a few issues of concern, including around the number of concurrent users Impala can support, but said they have been addressed during the beta period.</p>
<h2 id="one-piece-of-a-larger-platform">One piece of a larger platform</h2>
<p>Really, though, the whole point of Impala and its competitors is to turn Hadoop from a tool for batch analytics and mass storage <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into a platform that can handle nearly all of companies&#8217; data-processing needs</a>. In that regard, it appears we&#8217;re just getting started. Cloudera, MapR, Pivotal Greenplum and Hortonworks are already pushing their own products and projects, and Olson said &#8220;it&#8217;s absolutely our intent&#8221; to enhance Cloudera&#8217;s platform with even more open-source products &#8212; perhaps even more database technologies <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">a la HBase</a> &#8212; that will let users do more stuff with more types of data. Over time, this strategy could result in Hadoop displacing the current breed of databases and data warehouses and becoming the single data store atop of which users run whatever applications they so desire. For now, though, especially when it comes to Impala and the data warehouse incumbents, Olson is taking a measured approach. &#8220;The likelihood that we&#8217;re going to knock them off in the near term,&#8221; he said, &#8220;&#8230; it would be a tough fight to win.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=830569"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=830569" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300" medium="image">
			<media:title type="html">How Impala fits in</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">Hadapt&#039;s architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708" medium="image">
			<media:title type="html">The Pivotal HD and Hawq architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708" medium="image">
			<media:title type="html">The Stinger roadmap</media:title>
		</media:content>
	</item>
		<item>
		<title>Concurrent gets $4M for higher-level Hadoop</title>
		<link>http://gigaom.com/2013/03/20/concurrent-gets-4m-for-higher-level-hadoop/</link>
		<comments>http://gigaom.com/2013/03/20/concurrent-gets-4m-for-higher-level-hadoop/#comments</comments>
		<pubDate>Wed, 20 Mar 2013 16:33:29 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cascading]]></category>
		<category><![CDATA[Concurrent]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=622319</guid>
		<description><![CDATA[Cascading proprietor Concurrent has secured $4 million in venture capital in order to advance its efforts toward easing the development of big data applications.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622319&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.concurrentinc.com/">Concurrent</a>, proprietor of the open-source <a href="http://www.cascading.org/">Cascading framework</a> for developing big data workflows, has closed $4 million Series A investment round from True Ventures  <em>(see disclosure)</em> and Rembrandt Partners. Cascading has been around for a few years, actually, but Concurrent only <a href="http://gigaom.com/2011/07/26/concurrent-raises-900k-to-make-hadoop-easier/">raised seed funding in 2011</a> and has been riding the wave of interest in making big data easier to do.</p>
<p>In practice, Cascading is generally used as a higher-level method than MapReduce for writing Hadoop jobs, although it&#8217;s technically a framework that could support any number of distributed-processing frameworks. It&#8217;s <a href="http://gigaom.com/2012/08/15/meet-the-combo-behind-etsy-airbnb-and-climate-corp-hadoop-jobs/">used by a number of notable users</a>, including Etsy, Airbnb and Climate Corporation. In February, the Cascading project expanded its scope to address the growing SQL-on-Hadoop trend with a project called Lingual.</p>
<p>Software veteran Gary Nakamura is taking on the role of Concurrent CEO, replacing Cascading creator Chris Wensel, who&#8217;ll stay on as the company&#8217;s CTO.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/03/api-diagram-1.png"><img  alt="api-diagram (1)" src="http://gigaom2.files.wordpress.com/2013/03/api-diagram-1.png?w=708"   class="aligncenter size-full wp-image-622350" /></a></p>
<p><em><strong>Disclosure</strong>: Concurrent is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=622319&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=496838"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=496838" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=622319+concurrent-gets-4m-for-higher-level-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/20/concurrent-gets-4m-for-higher-level-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/wensel1-e1363797088502.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/wensel1-e1363797088502.jpeg?w=150" medium="image">
			<media:title type="html">wensel</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/api-diagram-1.png" medium="image">
			<media:title type="html">api-diagram (1)</media:title>
		</media:content>
	</item>
		<item>
		<title>Google BigQuery is now even bigger</title>
		<link>http://gigaom.com/2013/03/14/google-bigquery-is-now-even-bigger/</link>
		<comments>http://gigaom.com/2013/03/14/google-bigquery-is-now-even-bigger/#comments</comments>
		<pubDate>Thu, 14 Mar 2013 22:31:12 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytic database]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Dremel]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Structure Data 2013]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=620545</guid>
		<description><![CDATA[Google now allows joins within its BigQuery analytics service, as well as support for timestamped data and massive aggregations. Valuable stuff if you use BigQuery.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=620545&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Google <a href="://gigaom.com/2013/03/13/chris-wetherll-google-reader/">might be upsetting a lot of people</a> with some of its recent “spring cleaning,” but its latest batch of updates to <a href="https://cloud.google.com/products/big-query">BigQuery</a> should make data analysts happy, at least.</p>
<p>With the latest updates — <a href="http://googleenterprise.blogspot.com/2013/03/bringing-simplicity-to-large-data.html">announced in a blog post</a> by BigQuery Product Manager Ju-kay Kwek on Thursday — users can now join large tables, import and query timestamped data, and aggregate large collections of distinct values. It’s hardly the equivalent of Google launching Compute Engine last summer, but as (arguably) the inspiration for the <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">SQL-on-Hadoop trend that’s sweeping the big data world</a> right now, every improvement to BigQuery is notable.</p>
<p>BigQuery is a cloud service that lets users analyze terabyte-sized data sets using SQL-like queries. It’s based on <a href="http://research.google.com/pubs/pub36632.html">Google’s Dremel querying system</a>, which can analyze data where it’s located (i.e., in the Google File System or BigTable) and which Google uses internally to analyze a variety of different data sets. Google <a href="http://gigaom.com/2012/03/21/google-structure-data-2012/">claims queries in BigQuery run at interactive speeds</a>, which is something that MapReduce — the previous-generation tool for dealing with such large data sets — simply couldn’t handle within a reasonable time frame or level of complexity. Of course, if you want to schedule batch jobs, <a href="http://gigaom.com/2012/08/29/google-brings-bigquery-down-to-earth-with-excel-connector/">BigQuery lets you do that, too, for a lower price</a>.</p>
<p>This constraint — and therefore the potential benefits of something like Dremel and<a href="http://gigaom.com/2012/05/01/google-opens-up-its-biq-query-data-analytics-service-to-all/"> its commercial incarnation, BigQuery</a> — wasn’t lost on the Hadoop community, which itself had been largely reliant on MapReduce processing for years. In the past year, we’ve seen numerous startups <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">and large vendors</a> pushing their own Dremel-like (or MPP-like) technologies for data sitting in the Hadoop Distributed File System. If you happen to be in New York next week, you can hear some of the pioneers in this space talk about it at our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=620545+google-bigquery-is-now-even-bigger&amp;utm_content=dharrisstructure">Structure: Data conference</a>.</p>
<p>Background aside, the ability to join large data sets in BigQuery is probably the most-important of the three new functions. Joins are an essential aspect of data analysis in most environments because pieces of data that are relevant to each other don’t always reside within the same table or even within the same cluster. And joining tables of the size BigQuery is designed for can take a long time without the right query engine in place.</p>
<div id="attachment_620754" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/join.jpg"><img alt="How to do a join in BigQuery" src="http://gigaom2.files.wordpress.com/2013/03/join.jpg?w=708&#038;h=105" width="708" height="105" class="size-large wp-image-620754"></a><p class="wp-caption-text">How to do a join in BigQuery</p></div>
<p>Kwek offers an anecdote from Google that shows why joins, and the new aggregation function, are important:</p>
<blockquote id="quote-when-our-app-engine-"><p>[W]hen our App Engine team needed to reconcile app billing and usage information, Big JOIN allowed the team to merge 2TB of usage data with 10GB of configuration data in 60 seconds. Big Group Aggregations enabled them to immediately segment those results by customer. Using the integrated Tableau client the team was able to quickly visualize and detect some unexpected trends.</p></blockquote>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=620545&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=400486"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=400486" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=620545+google-bigquery-is-now-even-bigger&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=620545+google-bigquery-is-now-even-bigger&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=620545+google-bigquery-is-now-even-bigger&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=620545+google-bigquery-is-now-even-bigger&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/14/google-bigquery-is-now-even-bigger/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/bq.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/bq.jpg?w=150" medium="image">
			<media:title type="html">bq</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/join.jpg?w=708" medium="image">
			<media:title type="html">How to do a join in BigQuery</media:title>
		</media:content>
	</item>
		<item>
		<title>Looker raises $2M to help more companies simplify business intelligence</title>
		<link>http://gigaom.com/2013/03/06/looker-raises-2m-to-help-more-companies-simplify-business-intelligence/</link>
		<comments>http://gigaom.com/2013/03/06/looker-raises-2m-to-help-more-companies-simplify-business-intelligence/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 14:00:43 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Looker Data Services]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=617127</guid>
		<description><![CDATA[Looker Data Services has raised $2 million to help more enterprise employees easily run targeted business-intelligence queries with a proprietary SQL-based language called LookML. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=617127&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Looking to make it easier for more enterprise employees to drill down on sales data with a web-based product, <a href="http://looker.com/">Looker Data Services</a> has raised $2 million from First Round Capital and PivotNorth Capital.</p>
<p>Based in San Francisco and Santa Cruz, Calif., Looker has developed LookML, a proprietary language based on the SQL programming language for relational data, to enable users with little to no development background to make their own custom SQL queries of sales data. Once a customer signs up, Looker&#8217;s analysts review the customer&#8217;s data and, over a few days, custom-build the tool with options for querying the customer&#8217;s various databases in specific ways, said Lloyd Tabb, Looker&#8217;s founder and CEO. Once in place, Looker can run on a customer&#8217;s on-premise hardware or on hosted servers.</p>
<p>Many companies offer business analytics tools, and they come in different flavors. Oracle, IBM and other legacy IT vendors offer data warehouse appliances, although those can take engineers months to implement, Tabb said. Other BI vendors, including GoodData, Mixpanel and Tableau, can store customer data in the clouds, but those display bigger-picture trends in event data and don&#8217;t correlate well with user data, which aren&#8217;t in clouds, Tabb said. And then there&#8217;s <a href="http://gigaom.com/2012/11/28/amazons-new-data-warehousing-service-takes-aim-at-old-guard-it-giants/">Redshift</a>, Amazon Web Services&#8217; new data warehouse product.</p>
<p>Despite all the competition, Tabb believes Looker has a place in the market. The company, which is emerging from stealth mode about a year after its establishment, has more than 20 customers, including Simply Hired.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=617127&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=754141"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=754141" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=617127+looker-raises-2m-to-help-more-companies-simplify-business-intelligence&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=617127+looker-raises-2m-to-help-more-companies-simplify-business-intelligence&utm_content=gigajordan">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=617127+looker-raises-2m-to-help-more-companies-simplify-business-intelligence&utm_content=gigajordan">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=617127+looker-raises-2m-to-help-more-companies-simplify-business-intelligence&utm_content=gigajordan">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/06/looker-raises-2m-to-help-more-companies-simplify-business-intelligence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/lloyd_tabb-32-2.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/lloyd_tabb-32-2.jpg?w=150" medium="image">
			<media:title type="html">lloyd_tabb-32-2</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>
	</item>
		<item>
		<title>EMC to Hadoop competition: &#8220;See ya, wouldn&#8217;t wanna be ya.&#8221;</title>
		<link>http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/</link>
		<comments>http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 18:00:02 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[EMC Greenpum]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=613686</guid>
		<description><![CDATA[EMC Greenplum rolled out a new Hadoop distribution that fuses the popular big data platform with its flagship MPP database technology. Co-founder Scott Yara thinks the company's huge investment puts it in the catbird seat among Hadoop vendors.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613686&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If, like many industry watchers, you’ve been confused about EMC Greenplum’s Hadoop strategy over the past couple years, Scott Yara has a message for you: “We’re all in on Hadoop, period.”</p>
<p>Yara, Greenplum’s co-founder and senior vice president of products, has a not-so-coded message for his big data market competitors, too. Put simply, he doesn’t think they stand a chance against his company, and he served notice on Monday morning with the unveiling of the company’s new Pivotal HD Hadoop distribution and Project Hawq in a staged event at San Francisco’s Dogpatch Studios.</p>
<p>Pivotal HD is a completely re-architected Hadoop distribution that has been natively fused with Greenplum’s analytic database (that’s the Project Hawq part), but Yara thinks it’s a bigger deal than <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">just another SQL-on-Hadoop play</a>. In an interview last week, Yara told me that Project Hawq is the manifestation of Greenplum’s <a href="http://gigaom.com/2010/07/06/emc-buys-greenplum/">decision to sell itself to EMC in 2010</a>, a move he thought would would kickstart his company’s founding vision of becoming the leading big data platform.</p>
<h2 id="building-a-data-platform-costs">Building a data platform costs money, and lots of it</h2>
<p>But before the details, a little history. Greenplum’s flagship product is an analytic database powered by a massively parallel processing (MPP) and query engine. The company had raised nearly $100 million in venture capital around this technology since launching in 2003, but doing business in the enterprise software world is hard and expensive, and Greenplum needed more money.</p>
<div id="attachment_502146" class="wp-caption alignleft" style="width: 310px"><img alt="Rob Me of Pivotal Labs, Scott Yara of EMC, and Om Malik of GigaOM at Structure:Data 2012" src="http://gigaom2.files.wordpress.com/2012/03/1z5o1154.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-502146"><p class="wp-caption-text">Yara (left) with Pivotal Labs CEO Rob Me and Om Malik at Structure: Data 2012 (c) 2012 Pinar Ozger. pinar@pinarozger.com</p></div>
<p>“I thought it was going to take another couple hundred million dollars in investment for us to complete the technical vision we had and go to market,” Yara explained. But finding that kind of money wasn’t so easy in an investment environment where everyone was gaga over social apps like Facebook and Zynga. When EMC approached with a deal like it gave VMware in 2003 — essentially near complete independence bolstered by a huge R&amp;D and marketing budget — Greenplum couldn’t refuse.</p>
<p>Yara said Greenplum had known for a while that Hadoop was the key to any big data strategy going forward, but that it would take some time to build up its own technology. So, in 2011, it <a href="http://gigaom.com/2011/05/09/emc-hadoop/">entered into a reseller agreement with Hadoop startup MapR</a> to offer a premium product to appease enterprise customers while Greenplum’s engineers got to work on what would become Pivotal HD. That deal with MapR is still in place, but it’s no longer the focal point of Greenplum’s Hadoop strategy.</p>
<h2 id="big-investment-big-aspirations">Big investment, big aspirations</h2>
<p>The technology inside Pivotal HD is what companies should come to expect from a Hadoop distribution, Yara explained. It’s essentially the Greenplum Database with its POSIX file system ripped out and replaced by the Hadoop Distributed File System. Whatever users can do on Greenplum’s flagship database, they can do on Pivotal HD, only they can run Hadoop MapReduce jobs and house an HBase database, too.</p>
<p><img alt="hawq" src="http://gigaom2.files.wordpress.com/2013/02/hawq.jpg?w=708&#038;h=386" width="708" height="386" class="aligncenter size-large wp-image-613705"></p>
<p>And when SQL-like features become an important part of Hadoop because it’s so broadly installed that users are now seeking out broader utility, “that’s when the bar gets raised in terms of the amount of capability that’s required,” Yara said. He said Pivotal HD includes years worth of investment in Hadoop cluster-management technology and professional support, too, and that they will cost half as much as what Cloudera and Hortonworks charge. It’s designed to run smoothly wherever customers want it to — physical servers, virtual servers or even cloud servers.</p>
<p><a href="http://structuredata2013-editgraphic.eventbrite.com/"><img alt="Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now." src="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banners_300x2001.png?w=708"   class="alignright size-full wp-image-610577"></a>Because they’re so new, he said, competitive SQL-on-Hadoop offerings such as <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">Cloudera’s Impala</a> can only handle about 20 percent of real-world workloads. Looking back at the capital investment in analytics and big data technologies past, things like Netezza, Teradata and Aster Data, Yara proffered, “I don’t think you could build [a full SQL-on-Hadoop] system for less than $25 to $50 million over five years.” (Some of those new technologies, by the way, will have a chance to state their cases during a <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=613686+emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya&amp;utm_content=dharrisstructure">Structure: Data</a> panel on March 21 that’s all about Hadoop as the next-generation business intelligence platform.)</p>
<p>Greenplum, by contrast, rebuilt its entire R&amp;D team to focus on bringing 10 years of database technology to Hadoop. “We literally have over 300 engineers working on our Hadoop platform,” Yara said. “… We’re bringing all the power of EMC and VMware behind it.”</p>
<h2 id="the-data-warehouse-is-the-new-">The data warehouse is the new mainframe</h2>
<p>Looking past his competitive boasting, though, it’s easy to see Yara’s greater point when you ask him what all this Hadoop talks means for the data warehouse business on which Greenplum was built. He points to the mainframe business that fell from its high perch decades ago but still drives billions a year in revenue. A single MPP database system is still faster on certain workloads than SQL on Hadoop, but that gap will close over time and  “I do think the center of gravity will move toward HDFS,” he said.</p>
<p>Josh Klahr, a Pivotal HD product manager, noted the importance of being able to process all of a company’s data right in a single scalable data store rather than operating numerous systems. He pointed to one customer that’s storing a petabyte of data in Greenplum Database but wants to grow its data volume to 20 petabyes over the next few years and needs something like Hadoop to do that both financially and technically. He said Netflix’s <a href="http://gigaom.com/2013/01/10/netflix-shows-off-its-hadoop-architecture/">decision to store all its data in Amazon S3</a> and bring analytic services to it is a good indicator of where the market is headed.</p>
<p>A few years ago, Yara acknowledged, embracing Hadoop as the future might have been a scary proposition. However, he said, “Now, if you don’t embrace Hadooop as the new database platform, if you’re a database vendor, that’s a grave mistake.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613686&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=795533"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=795533" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613686+emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613686+emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613686+emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2010/09/the-red-hot-data-warehouse-market-whos-buying-next/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613686+emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya&utm_content=dharrisstructure">The Red-Hot Data Warehouse Market: Who&#8217;s Buying Next?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1096.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1096.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Scott Yara – SVP, Products and Co-Founder, Greenplum, a division of EMC</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/03/1z5o1154.jpg?w=300" medium="image">
			<media:title type="html">Rob Me of Pivotal Labs, Scott Yara of EMC, and Om Malik of GigaOM at Structure:Data 2012</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq.jpg?w=708" medium="image">
			<media:title type="html">hawq</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banners_300x2001.png" medium="image">
			<media:title type="html">Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now.</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL is what&#8217;s next for Hadoop: Here&#8217;s who&#8217;s doing it</title>
		<link>http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/</link>
		<comments>http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/#comments</comments>
		<pubDate>Thu, 21 Feb 2013 18:29:31 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=612289</guid>
		<description><![CDATA[More and more companies and open source projects are trying to let users run SQL queries from inside Hadoop itself. Here's a list of what's available and, on a high level, how they work.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=612289&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>When we first began putting together the schedule for <a href="http://event.gigaom.com/structuredata?utm_source=data&amp;utm_medium=editorial&amp;utm_content=dharrisstructure&amp;utm_campaign=intext&amp;utm_term=612289+sql-is-whats-next-for-hadoop-heres-whos-doing-it">Structure: Data</a> several months ago, we knew that running SQL queries on Hadoop would be a big deal — we just didn’t know how big a deal it would actually become. Fast-forward to today, a mere month away from the event (March 20-21 in New York), and the writing on the wall is a lot clearer. SQL support isn’t the end-game for Hadoop, but it’s the feature that will help Hadoop find its way into more places in more companies that understand the importance of next-generation analytics but don’t want to (or can’t yet) re-invent the wheel by becoming MapReduce experts.</p>
<p>In fact, there are now so many products and projects pushing SQL queries and interactive data analysis on Hadoop — including two more announced this week — that it’s getting hard to keep track. But I’ll do my best.</p>
<p>Of course, Facebook began this whole movement to bring SQL database-like functionality to Hadoop when it created Hive in 2009. Hive, <a href="http://hive.apache.org/">now an Apache project</a>, includes a data-management layer and SQL-like query language called HiveQL. It has proven rather useful and popular over the years, but Hive’s reliance on MapReduce makes it somewhat slow by nature — MapReduce scans the entire data set and moves a lot of data over the network while processing a job — and there hasn’t been much effort to package it in a manner that might attract mainstream users.</p>
<p>And keep in mind that this next generation of SQL-on-Hadoop tools aren’t just business intelligence or database products that can access data stored in Hadoop; EMC Greenplum, HP Vertica, IBM Netezza, ParAccel, Microsoft SQL Server and Teradata/Aster Data (which this week <a href="http://www.asterdata.com/news/teradata-aster-discovery-platform-offers-powerful-data-science-solution.php">released some cool new features</a> for just this purpose) all allow some sort of access to Hadoop data. Rather, these are applications, frameworks and engines that let users query Hadoop data from inside Hadoop, sometimes by re-architecting the underlying compute and data infrastructures. The beauty of this approach is that data is usable in its existing form and, in theory, doesn’t require two separate data stores for analytic applications.</p>
<h2 id="data-warehouses-and-bi-the-str">Data warehouses and BI: The Structure: Data set</h2>
<p><a href="http://structuredata2013-editgraphic.eventbrite.com/"><img alt="Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now." src="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banners_300x2001.png?w=708"   class="alignleft size-full wp-image-610577"></a>I’m highlighting this group of companies first, not because I think they’re the best (although that might well be), but because I’m truly excited about the panel they’ll be featured on at our conference next month. The panel is moderated by Facebook engineering manager Ravi Murthy– a guy who knows his way around a database — so they’ll have to answer some tough questions from one of the most-advanced and most-aggressive Hadoop and analytics tools users out there:</p>
<p><strong><a href="http://incubator.apache.org/drill/">Apache Drill</a>: </strong>Drill is a MapR-led effort to create a Google Dremel-like (or BigQuery-like) interactive query engine on top of Hadoop. First <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">announced in August</a>, the project is still under development and in the incubator program within Apache. According to its web site, “One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds.”</p>
<p><strong><a href="http://hadapt.com">Hadapt</a>:</strong> Hadapt, which actually <a href="http://gigaom.com/2011/03/23/making-hadoop-work-in-more-places-with-hadapt/">launched at Structure: Data in 2011</a>, was the first of the SQL on Hadoop vendors and is somewhat unique in that it has a real product on the market and real users in production. Its unique architecture includes tools for advanced SQL functions and a split-execution engine for MapReduce and relational tasks, and both HDFS and relational storage. In October, the company <a href="http://gigaom.com/2012/10/16/hadapt-does-big-love-for-big-data-and-hints-at-hadoops-future/">announced a tight integration with Tableau Software</a> around advanced visual analytics.<strong> </strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/had_graphic2-scaled.jpg"><img alt="HAD_Graphic2-scaled" src="http://gigaom2.files.wordpress.com/2013/02/had_graphic2-scaled.jpg?w=708"   class="aligncenter size-full wp-image-612351"></a></p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2013/02/platforaarch.jpg"><img alt="platforaarch" src="http://gigaom2.files.wordpress.com/2013/02/platforaarch.jpg?w=92&#038;h=150" width="92" height="150" class="alignright size-thumbnail wp-image-612755"></a><a href="http://platfora.com">Platfora</a>: </strong>Technically not a SQL product, Platfora is red-hot right now and is trying to re-imagine the world of business intelligence for a big data world. Essentially an HTML5 canvas laid atop Hadoop and an in-memory, massively parallel processing engine, the company’s software, which <a href="http://gigaom.com/2012/10/23/platfora-shows-a-whole-new-way-to-do-business-intelligence-on-big-data/">it unveiled in October</a>, is designed to make analyzing data stored in Hadoop a fast and visually intuitive process.</p>
<p><strong><a href="http://www.qubole.com">Qubole</a>:</strong> Qubole is an interesting case in that it’s essentially a cloud-based version of the popular <a href="http://hive.apache.org/">Apache Hive</a> framework <a href="http://gigaom.com/2012/06/06/exclusive-the-brains-behind-hive-launch-on-demand-hadoop-service/">launched by the guys who created Hive while working at Facebook</a>. Qubole claims it auto-scaling abilities, optimized Hadoop code and columnar data cache make its service run much faster than Hive alone — and running on Amazon Web Services makes it easier than maintaining a physical cluster.<strong> </strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/cache.jpg"><img alt="cache" src="http://gigaom2.files.wordpress.com/2013/02/cache.jpg?w=708&#038;h=456" width="708" height="456" class="aligncenter size-full wp-image-612765"></a></p>
<h2 id="data-warehouses-and-bi-the-res">Data warehouses and BI: The rest</h2>
<p><strong><a href="http://www.citusdata.com/">Citus Data</a>:</strong> Citus Data’s CitusDB isn’t just about Hadoop, but rather <a href="http://gigaom.com/2013/02/19/citusdb-today-sql-on-hadoop-tomorrow-the-world/">wants to bring the power of its distributed Postgres implementation to all types of data</a>. It relies on Postgres’s foreign data wrappers feature to convert disparate data types into the database’s native format, and then on its own distributed-processing technology to carry out queries in seconds or less. Because of its Postgres foundation, CitusDB can join data from different data sources and retains all the native features that come with that database.<strong> </strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/citus_hadoop_architecture1.png"><img alt="citus_hadoop_architecture" src="http://gigaom2.files.wordpress.com/2013/02/citus_hadoop_architecture1.png?w=708"   class="aligncenter size-full wp-image-612399"></a></p>
<p><strong><a href="http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/">Cloudera Impala</a>:  </strong>Cloudera’s Impala <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">might just be the most-important SQL-on-Hadoop effort</a> around because of Cloudera’s expansive installation and partner footprints. It’s a massively parallel processing engine that bypasses MapReduce to enable interactive queries on data stored in either HDFS or HBase, using the same variant of SQL that Hive uses. However, because Cloudera doesn’t build applications, it’s relying on higher-level BI and analytics partners to provide the user interface.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/impala.png"><img alt="impala" src="http://gigaom2.files.wordpress.com/2013/02/impala.png?w=708"   class="aligncenter size-full wp-image-612405"></a></p>
<p><strong><a href="http://karmasphere.com">Karmasphere</a>: </strong>Karmasphere is one of the first startups to build an analytic application atop Hadoop, and in <a href="http://gigaom.com/2012/06/11/is-2013-the-year-hadoop-uptake-turns-into-a-tornado/">its 2.0 release last year</a> the company added support for SQL queries of data in HDFS. Like Hive, Karmasphere still relies on MapReduce to process queries, which means it’s inherently slower than newer approaches. However, unlike Hive, Karmasphere allows for parallel queries to run at the same time and includes a visual interface for writing queries and filtering results.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/multiple-large.png"><img alt="multiple-large" src="http://gigaom2.files.wordpress.com/2013/02/multiple-large.png?w=708&#038;h=307" width="708" height="307" class="aligncenter size-large wp-image-612778"></a></p>
<p><strong><a href="http://www.cascading.org/lingual/">Lingual</a>:</strong> Lingual is a new open source project from Concurrent <em>(see disclosure)</em>, the parent company of the Cascading framework for Hadoop. <a href="http://www.marketwire.com/press-release/concurrent-enables-sql-users-build-big-data-applications-on-hadoop-less-than-30-seconds-1759041.htm">Announced on Wednesday</a>, Lingual runs on Cascading and gives developers and analysts a true ANSI SQL interface from which to run analytics or build applications. Lingual is compatible with traditional BI tools, JDBC  and the Cascading family of APIs.<strong> </strong></p>
<p><strong><a href="https://github.com/forcedotcom/phoenix">Phoenix</a>: </strong>Phoenix is a new and relatively unknown open source project that comes out of Salesforce.com and aims to allow fast SQL queries of data stored in HBase, the NoSQL database built atop HDFS. Its stated mission: “Become the standard means of accessing HBase data through a well-defined, industry standard API.” Users interact with it through JDBC interfaces, and its developers claim its sub-second response times for small queries and seconds-long response for querying tens of millions of rows.</p>
<div id="attachment_612413" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/squirrel-copy.jpg"><img alt="A sample of Phoenix via the SQuirreL client" src="http://gigaom2.files.wordpress.com/2013/02/squirrel-copy.jpg?w=708&#038;h=496" width="708" height="496" class="size-large wp-image-612413"></a><p class="wp-caption-text">A sample of Phoenix via the SQuirreL client</p></div>
<p><strong><a href="http://gigaom2.files.wordpress.com/2013/02/shark.jpg"><img alt="shark" src="http://gigaom2.files.wordpress.com/2013/02/shark.jpg?w=300&#038;h=219" width="300" height="219" class="alignright size-medium wp-image-612439"></a><a href="http://shark.cs.berkeley.edu/">Shark</a>:</strong> Shark isn’t technically Hadoop, but it’s cut from the same cloth. <em>Shark</em>, in this case, stands for “Hive on Spark,” with Hive meaning the same thing it does to Hadoop, but with Spark <a href="http://spark-project.org/">being an in-memory platform</a> designed to run parallel-processing jobs 100 times faster than MapReduce (a speed improve over traditional Hive that Shark also claims). Shark also includes APIs for turning query results into a type of data format amenable to machine learning algorithms. Both Shark and Spark are developed by the University of California, Berkeley’s <a href="https://amplab.cs.berkeley.edu/projects/">AMPLab</a>.<strong><br></strong></p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2013/02/screen-shot-2013-02-19-at-5-37-01-pm-300x235.png"><img alt="Screen-Shot-2013-02-19-at-5.37.01-PM-300x235" src="http://gigaom2.files.wordpress.com/2013/02/screen-shot-2013-02-19-at-5-37-01-pm-300x235.png?w=708"   class="alignright size-full wp-image-612322"></a><a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger Initiative</a>: </strong>Launched on Wednesday (along with <a href="http://hortonworks.com/blog/introducing-knox-hadoop-security/">a security gateway called Knox</a> and a <a href="http://hortonworks.com/blog/introducing-tez-faster-hadoop-processing/">faster, simpler processing framework called Tez</a>), the Stinger Initiative is a Hortonworks-led effort to make Hive faster — up too 100x — and more functional. Stinger adds more SQL analytics capabilities to Hive, but the most-important aspects are infrastructural: an optimized execution engine, a columnar file format and the ability to avoid MapReduce bottlenecks by running atop Tez.</p>
<h2 id="operational-sql">Operational SQL</h2>
<p><strong><a href="http://drawntoscale.com/">Drawn to Scale</a>:</strong> Drawn to Scale is a startup that has <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">built an operational SQL database on top of HBase</a>. The key word here is database, as its product, called Spire, is modeled after Google’s F1 designed to power transactional applications as analytic ones. Spire has a fully distributed index and queries are sent only to the node with the relevant data, so reads and writes are fast and the system can handle lots of concurrent users without falling down.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/spirearchitecture-015.png"><img alt="SpireArchitecture.015" src="http://gigaom2.files.wordpress.com/2013/02/spirearchitecture-015-e1361407038325.png?w=708&#038;h=438" width="708" height="438" class="aligncenter size-large wp-image-612477"></a></p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2013/02/splice.jpg"><img alt="splice" src="http://gigaom2.files.wordpress.com/2013/02/splice.jpg?w=300&#038;h=166" width="300" height="166" class="alignright size-medium wp-image-612669"></a><a href="http://www.splicemachine.com">Splice Machine</a>: </strong>Database startup Splice Machine is also trying to get into the operational space by building its Splice SQL Engine atop the naturally distributed HBase database. Splice Machine focuses its message on transactional integrity, which is really where it separates itself from scalable NoSQL databases and analytics-focused SQL-on-Hadoop efforts. It relies on HBase’s aut0-sharding feature in order to making scaling an easy process.</p>
<p><a href="http://structuredata2013-editgraphic.eventbrite.com"><img src="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banner_590x1101.png?w=708" alt="Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now."   class="aligncenter size-full wp-image-610578"></a></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-73008p1.html">Shutterstock user hauhu</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=612289&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=756421"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=756421" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=612289+sql-is-whats-next-for-hadoop-heres-whos-doing-it&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=612289+sql-is-whats-next-for-hadoop-heres-whos-doing-it&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=612289+sql-is-whats-next-for-hadoop-heres-whos-doing-it&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=612289+sql-is-whats-next-for-hadoop-heres-whos-doing-it&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" medium="image">
			<media:title type="html">sql statement</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banners_300x2001.png" medium="image">
			<media:title type="html">Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">HAD_Graphic2-scaled</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/platforaarch.jpg?w=92" medium="image">
			<media:title type="html">platforaarch</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/cache.jpg" medium="image">
			<media:title type="html">cache</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/citus_hadoop_architecture1.png" medium="image">
			<media:title type="html">citus_hadoop_architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/impala.png" medium="image">
			<media:title type="html">impala</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/multiple-large.png?w=708" medium="image">
			<media:title type="html">multiple-large</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/squirrel-copy.jpg?w=708" medium="image">
			<media:title type="html">A sample of Phoenix via the SQuirreL client</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shark.jpg?w=300" medium="image">
			<media:title type="html">shark</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/screen-shot-2013-02-19-at-5-37-01-pm-300x235.png" medium="image">
			<media:title type="html">Screen-Shot-2013-02-19-at-5.37.01-PM-300x235</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/spirearchitecture-015-e1361407038325.png?w=708" medium="image">
			<media:title type="html">SpireArchitecture.015</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/splice.jpg?w=300" medium="image">
			<media:title type="html">splice</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banner_590x1101.png" medium="image">
			<media:title type="html">Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now.</media:title>
		</media:content>
	</item>
		<item>
		<title>CitusDB: Today, SQL on Hadoop. Tomorrow, the world!</title>
		<link>http://gigaom.com/2013/02/19/citusdb-today-sql-on-hadoop-tomorrow-the-world/</link>
		<comments>http://gigaom.com/2013/02/19/citusdb-today-sql-on-hadoop-tomorrow-the-world/#comments</comments>
		<pubDate>Tue, 19 Feb 2013 18:59:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Citus Data]]></category>
		<category><![CDATA[CitusDB]]></category>
		<category><![CDATA[Data Collective]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=611709</guid>
		<description><![CDATA[Citus Data has expanded its high-speed, analytic database called CitusDB beyond Postgres and into Hadoop. Up next, MongoDB and just about anything else you can think of.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=611709&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Database startup <a href="http://www.citusdata.com">Citus Data</a> on Tuesday joined those trying to enable fast SQL queries on Hadoop data, but it has much larger goals. It thinks it can be the only analytic database that anyone needs, able to query data wherever it’s stored across a company’s environment — in relational databases, Hadoop, MongoDB, Amazon S3 and elsewhere.</p>
<p>Big data has opened companies’ eyes to the importance of analytics and alternative data stores, but combining the two often means learning new languages, using multiple tools and probably sacrificing the performance they’re used to from analytic platforms.</p>
<p>Citus Data’s flagship product, called CitusDB, is actually built atop PostgreSQL and its first iteration was designed for <a href="http://gigaom.com/2012/05/01/google-opens-up-its-biq-query-data-analytics-service-to-all/">Google Dremel-like scale and speed</a> on relational data. Thanks to a feature called “foreign data wrappers,” though, it’s able to run SQL on numerous data types (e.g., CSV, log and JSON files) that don’t comport with how Postgres formats data natively. So, while CitusDB now officially supports the Hadoop Distributed File System in addition to Postgres, it is by no means limited to them.</p>
<p>Matt Ocko, managing partner at <a href="http://gigaom.com/2012/08/09/big-data-vc-firm-data-collective-steps-out-of-the-shadows/">Data Collective</a> and one of Citus Data’s early investors, says the database can technically support any data source with an ODBC driver, and even could query something like log files straight from a data store. In fact, Citus is working on extending its support to MongoDB — a capability that’s in beta right now. Ocko is also particularly impressed with CitusDB’s ability to act like a fabric connecting all these data sources rather than making users query each independently and then manually join the data. He cited a demonstration in which CitusDB carried out a query that required executing a join across Postgres and Hadoop.</p>
<p>The other big thing about CitusDB is that it’s not just flexible but fast, too. Ocko said CitusDB has outperformed Oracle’s vaunted Exadata machine on a TPC-H benchmark test with data stored primarily on hard disk. That Postgres-Hadoop query he referenced completed in just a few seconds while running on the Amazon EC2 cloud.</p>
<p>CitusDB is so fast, Citus Co-founder Umur Cubukcu told me, because of how it’s architected. It moves the computation to where the data is rather than trying to move data across the network, and it has some impressive load-balancing the resource-management abilities baked in. If, for example, it needs data housed on a slow-running node in order to complete a task, the software will look for that data elsewhere rather than just wait for the congested resource to free up.</p>
<p>In the case of Hadoop, MapReduce brings the computation to the data, too, but every job requires a scan over the entire dataset. This is why early SQL-on-Hadoop tools such as Hive <a href="http://drawntoscale.com/is-there-a-database-in-big-data-heaven-understanding-the-world-of-sql-on-hadoop/">are still relatively slow</a>. Citus software engineer Carl Steinbach, who came to the company from Cloudera, said CitusDB is between 3 and 20 times faster than Hive depending on the query type.</p>
<p>It’s actually much faster for short queries that might be typical in an interactive environment, but he acknowledged those aren’t really what Hive was designed to do.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/02/citus_hadoop_architecture.png"><img alt="Citus_Hadoop_Architecture" src="http://gigaom2.files.wordpress.com/2013/02/citus_hadoop_architecture.png?w=708"   class="aligncenter size-full wp-image-611818"></a></p>
<p>Rather, CitusDB’s real competition is the spate of SQL-on-Hadoop projects, products and startups of which it’s now a part. We’ll have a whole session dedicated to this topic at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=611709+citusdb-today-sql-on-hadoop-tomorrow-the-world&amp;utm_content=dharrisstructure">Structure: Data</a> next month, and there isn’t enough room for everything on the market right now — <a href="http://gigaom.com/2012/10/17/batten-down-the-analysts-its-a-big-data-bi-storm/">Aster Data</a>, <a href="http://gigaom.com/2012/11/13/plotting-a-bi-coup-hadoop-startup-platfora-raises-20m/">Platfora</a>, Cloudera (<a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">with Impala</a>), <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Apache Drill</a>, <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">Drawn to Scale</a> and <a href="http://gigaom.com/2012/10/16/hadapt-does-big-love-for-big-data-and-hints-at-hadoops-future/">Hadapt</a>, to name several.</p>
<p>These are impressive technologies (at least in theory where they’re still under development), and Citus would be remiss to ignore them. But, aside from the ability to query multiple data sources, the company has something the others don’t, Cubukcu said: It has the Postgres community and all the features they’ve built into that database already. Things like connectors, authentication, full-text search and PostGIS for geospatial data that go beyond just running fast queries.</p>
<p>“When you’re talking about an enterprise-class database,” Steinbach said, “you’re talking about more than a query execution engine.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=611709&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=710264"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=710264" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611709+citusdb-today-sql-on-hadoop-tomorrow-the-world&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611709+citusdb-today-sql-on-hadoop-tomorrow-the-world&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611709+citusdb-today-sql-on-hadoop-tomorrow-the-world&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/11/real-%c2%adtime-query-for-hadoop-democratizes-access-to-big-data-analytics/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=611709+citusdb-today-sql-on-hadoop-tomorrow-the-world&utm_content=dharrisstructure">Real-­time query for Hadoop democratizes access to big data analytics</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/19/citusdb-today-sql-on-hadoop-tomorrow-the-world/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_100650307.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_100650307.jpg?w=150" medium="image">
			<media:title type="html">database</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/citus_hadoop_architecture.png" medium="image">
			<media:title type="html">Citus_Hadoop_Architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>The fourth quarter of 2012 in cloud</title>
		<link>http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/</link>
		<comments>http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/#comments</comments>
		<pubDate>Thu, 17 Jan 2013 07:55:48 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/jomaitland/" rel="author">Jo Maitland</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[10Gen]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloud Foundry]]></category>
		<category><![CDATA[cloud-based databases]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[Datum]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[enterprise IT]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[hybrid clouds]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[marklogic]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Palantir]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[private clouds]]></category>
		<category><![CDATA[Public Clouds]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[SiSense]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[the Pivotal Initiative]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=165792</guid>
		<description><![CDATA[The last quarter of 2012 saw the rise of cloud-based databases, the cloud awakening of software giants such as HP, and many cloud outages that have left question marks. Enterprises found more IT dollars, and they will focus on the cloud for much of that spending.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=602029&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The last quarter of 2012 saw the rise of cloud-based databases, the cloud awakening of software giants such as HP, and many cloud outages that have put question marks around the use of cloud computing. Many enterprises found more IT dollars in their budgets, and they will focus on the cloud for much of that spending. And while the enterprise focused largely on private clouds, interest in public cloud computing is greater than many analysts expected. This fourth-quarter analysis discusses these trends and more.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=602029&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=306837"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=306837" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=602029+cloud-and-data-fourth-quarter-2012-analysis&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=602029+cloud-and-data-fourth-quarter-2012-analysis&utm_content=gigaedit">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=602029+cloud-and-data-fourth-quarter-2012-analysis&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=602029+cloud-and-data-fourth-quarter-2012-analysis&utm_content=gigaedit">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" medium="image">
			<media:title type="html">gigaompromasterimagecloud</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>With another $12M, ScaleArc wants to keep your database relevant</title>
		<link>http://gigaom.com/2013/01/10/with-another-12m-scalearc-wants-to-keep-your-database-relevant/</link>
		<comments>http://gigaom.com/2013/01/10/with-another-12m-scalearc-wants-to-keep-your-database-relevant/#comments</comments>
		<pubDate>Thu, 10 Jan 2013 15:00:57 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[NewSQL]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[ScaleArc]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=600676</guid>
		<description><![CDATA[ScaleArc's technology sits between applications and their SQL databases, claiming to provide better performance and better operational insights than running MySQL, Oracle Database or Microsoft SQL Server alone. With a $12.3 million Series C round, ScaleArc will try to withstand a glut of competition.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=600676&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Maybe your old database will work just fine after all. Santa Clara, Calif.-based database specialist <a href="http://www.scalearc.com/">ScaleArc</a> has just raised a $12.3 million Series C round to grow its business of making your SQL database better and faster. However, the investment &#8212; from Accel Partners, as well as Trinity Ventures and Nexus Ventures &#8212; comes as a collection of born-and-bred-to-scale startups is trying to breathe new life into SQL.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/01/transparentidbplacement.jpg"><img  alt="transparentidbplacement" src="http://gigaom2.files.wordpress.com/2013/01/transparentidbplacement.jpg?w=300&#038;h=188" width="300" height="188" class="alignleft size-medium wp-image-600696" /></a>ScaleArc, for its part, wants to let customers keep their MySQL, Oracle and SQL Server databases by making them faster and more intelligent. Its product, called iDB, sits between an application and its database and does a whole lot of good things &#8212; load-balancing, caching, SQL analytics and real-time dashboards among them. Recognizing the validity of certain new approaches for serving often-accessed data in a hurry (e.g., memcached) iDB&#8217;s cache actually uses a NoSQL database, although it&#8217;s backed up by the product&#8217;s standard security and high-availability features.</p>
<p>But for a company like ScaleArc, as well as the database products it aims to improve, the biggest threat probably isn&#8217;t NoSQL at all. Rather, they&#8217;re facing a new breed of SQL products &#8212; the so-called NewSQL movement &#8212; that promise everything developers and companies love about SQL with the scalability and speed that NoSQL can deliver. Among these offerings are Drawn to Scale&#8217;s HBase-based <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">Spire</a>, the SQL-into-C++ <a href="http://gigaom.com/2012/06/18/ex-facebookers-launch-memsql-to-make-your-database-fly/">MemSQL</a> and OLTP engine <a href="http://gigaom.com/2012/07/09/new-look-database-startup-nuodb-gets-10m-to-scale-up-and-out/">NuoDB</a>, although <a href="http://en.wikipedia.org/wiki/NewSQL">there are many more</a>.</p>
<p>Still, it seems clearer now than at any point in the past few years that SQL truly is not going anywhere, despite the early hype from the NoSQL camp. The most-important question that SQL shops might have to ask themselves now is not whether to stick with the technology they know and (maybe) love, but which flavor to go with to suit their particular needs around scale and performance. Thanks to technologies like ScaleArc, they might not have to swap their database at all.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=600676&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=335350"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=335350" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600676+with-another-12m-scalearc-wants-to-keep-your-database-relevant&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600676+with-another-12m-scalearc-wants-to-keep-your-database-relevant&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600676+with-another-12m-scalearc-wants-to-keep-your-database-relevant&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600676+with-another-12m-scalearc-wants-to-keep-your-database-relevant&utm_content=dharrisstructure">Big data 2013: key trends and companies to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/01/10/with-another-12m-scalearc-wants-to-keep-your-database-relevant/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" medium="image">
			<media:title type="html">Shiny database</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/transparentidbplacement.jpg?w=300" medium="image">
			<media:title type="html">transparentidbplacement</media:title>
		</media:content>
	</item>
	</channel>
</rss>
