<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; Hadoop</title>
	<atom:link href="http://gigaom.com/tag/hadoop/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 03:33:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; Hadoop</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Database startup Drawn to Scale is closing down</title>
		<link>http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/</link>
		<comments>http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/#comments</comments>
		<pubDate>Fri, 17 May 2013 21:24:03 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Drawn to Scale]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=646718</guid>
		<description><![CDATA[Database startup Drawn to Scale, creator of the SQL-on-Hadoop technology called Spire, is closing down. The company's product, Spire, was one of the first SQL-on-Hadoop technologies.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646718&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Database startup Drawn to Scale, creator of the SQL-on-Hadoop technology called Spire, is closing down. Co-founder and CEO Bradford Stephens officially <a href="http://www.roadtofailure.com/?p=11">announced the closure in a blog post</a> on Friday.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/spirearchitecture-015-e1361407038325.png"><img  alt="spirearchitecture-015-e1361407038325" src="http://gigaom2.files.wordpress.com/2013/05/spirearchitecture-015-e1361407038325.png?w=300&#038;h=185" width="300" height="185" class="alignleft size-medium wp-image-646740" /></a>The company&#8217;s product, Spire, which provided full SQL support on top of the HBase NoSQL database, was one of the first products to <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">try to blend Hadoop&#8217;s scalability with the robustness and familiarity of SQL</a>. That&#8217;s now <a href="http://gigaom.com/2013/03/05/the-hadoop-ecosystem-the-welcome-elephant-in-the-room-infographic/">an increasingly crowded space</a> (and has grown since that linked graphic was created). In March, Drawn to Scale <a href="http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/">expanded its support to MongoDB</a>, as well.</p>
<p>I wasn&#8217;t shocked when Stephens told me the news &#8212; questions about the four-year-old company&#8217;s financial health had been swirling for a while &#8212; but to hear of its financial woes was a bit surprising. His account in the post pretty much echoes what I had heard from others:</p>
<blockquote id="quote-it-seemed-we-had-eve"><p>&#8220;It seemed we had everything going for us — paid customers such as American Express, Orange Telecom, Flurry, and 4 others. Our technology worked brilliantly, we had a big hiring pipeline, and we had great media presence against our competitors who raised 10-100x more cash.&#8221;</p></blockquote>
<p>He added:</p>
<blockquote id="quote-yet-five-days-before2"><p>&#8220;Yet five days before we signed term sheets for a big A round or sold the company, we started getting hit by a series of black swans — and we just didn’t have what we needed to recover. I’ll leave the public detail at that level, but I will say that paying employees’ health insurance out of your meager savings is a powerful incentive to change course.&#8221;</p></blockquote>
<p>Up to this point, the company <a href="http://gigaom.com/2012/03/08/drawn-to-scale-raises-money-to-make-sql-big-data-ready/">had raised $925,000</a> from RTP Ventures, IA Ventures and SK Ventures. There&#8217;s no word yet on what will come of the company&#8217;s intellectual property.</p>
<p>As Stephens &#8212; who&#8217;s now doing an entrepreneur-in-residence gig at Ping Identity and helping out other startups (including popular wardrobe app <a href="http://www.clothapp.com/">Cloth</a>) &#8212; succinctly put it during a phone discussion, &#8220;We just don&#8217;t have the horsepower to keep running the company.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=646718&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=132369"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=132369" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">Scaling Hadoop clusters: the role of cluster management</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=646718+database-startup-drawn-to-scale-is-closing-down&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/dtsdragon.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/dtsdragon.png?w=150" medium="image">
			<media:title type="html">dtsdragon</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/spirearchitecture-015-e1361407038325.png?w=300" medium="image">
			<media:title type="html">spirearchitecture-015-e1361407038325</media:title>
		</media:content>
	</item>
		<item>
		<title>Why 3 celebrity data scientists are willing to work for free &#8212; for you</title>
		<link>http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/</link>
		<comments>http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/#comments</comments>
		<pubDate>Wed, 08 May 2013 16:58:30 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hilary Mason]]></category>
		<category><![CDATA[Mortar Data]]></category>
		<category><![CDATA[recommendation engines]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=643353</guid>
		<description><![CDATA[Hadoop startup Mortar Data is offering to build recommendation systems for 10 companies, with help from Hilary Mason, Drew Conway and Max Shron. It's part of a bigger plan to democratize the science behind online recommendations.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=643353&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Hadoop-in-the-cloud startup Mortar Data is on a mission to bring recommendation engines to the masses, and it has recruited three well-known data scientists to aid its cause. On Wednesday, the company will start accepting applications <a href="http://mortardata.com/">on its website</a> from companies that would like to have Mortar Data &#8212; as well as Bit.ly&#8217;s <a href="http://www.hilarymason.com/">Hilary Mason</a>, IA Ventures Scientist-in-Residence <a href="http://drewconway.com/">Drew Conway</a> and freelancer (and former OKCupid data scientist) <a href="http://shron.net/about">Max Shron</a> &#8212; build a custom recommendation system for them.</p>
<p>The way it works, said Mortar Co-founder and CEO K Young, is that his company will choose eight companies (in addition to the two it has been working with already) to implement custom systems based on their specific needs and businesses. Mason, Conway and Shron will split their time among the 10 total companies, but will be much more than advisers &#8212; they&#8217;ll actually dig into the data and work hands-on to ensure the right techniques and algorithms are applied in the right places.</p>
<p>The applicant companies will keep any custom code, but the ultimate goal from Mortar&#8217;s perspective is to learn some best practices and create reusable building blocks that will let anyone create recommendation engines without pre-existing data science knowledge. Recommendation engines <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">are commonplace on large web sites</a> (Netflix, Spotify, iTunes, Google, Amazon, <a href="http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/">LinkedIn</a>, Eventbrite and the list goes on) but smaller companies can sometimes struggle to do them, or to do them well. Young hopes Mortar can establish an open source reference architecture of sorts that makes it easy to implement everything from building data pipelines to the actual algorithms that power recommendations.</p>
<p>&#8220;They&#8217;re really common and they&#8217;re really useful, but they&#8217;re really hard,&#8221; he said. &#8220;That&#8217;s why [a reference implementation] hasn&#8217;t been done before.&#8221;</p>
<div id="attachment_643436" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/05/gernres-support-1.jpg"><img  alt="They can get pretty complex, as evidence by this Netflix example." src="http://gigaom2.files.wordpress.com/2013/05/gernres-support-1.jpg?w=708&#038;h=358" width="708" height="358" class="size-large wp-image-643436" /></a><p class="wp-caption-text">They can get pretty complex, as evidence by this Netflix example.</p></div>
<p>Presently, Young explained, anyone wanting to build a recommendation system probably knows some of the algorithms to begin with and then gets to work researching how to implement them with specific processing frameworks (e.g., MapReduce) and on their specific data. Alternatively, they might have to hire a consultant that helps them build the recommendation engine. Either way, he noted, they&#8217;re probably not open sourcing it at the end because it&#8217;s presumed too valuable a competitive edge.</p>
<p>Mortar Data&#8217;s recommendation framework will be based on Pig, Python and Java, <a href="http://gigaom.com/2012/11/28/mortar-data-wants-to-become-a-hadoop-developers-best-friend/">just like the company&#8217;s flagship platform</a> for creating Hadoop jobs. Those languages will make the implementation more accessible and customizable by more people, Young said.</p>
<p>Really, he added, any web site or service that has multiple customers and deals with multiple entities &#8212; be they restaurants, songs, dating profiles, artisan necklaces, what have you &#8212; should have some sort of recommendation engine to help provide a more-intelligent customer experience. &#8220;It should become so ubiquitous that any service you go to knows enough about you to put forward the things you actually want to see,&#8221; Young said.</p>
<p>There is, however, one catch to Mortar&#8217;s plans as they stand: Because the service is hosted on Amazon Web Services, anyone interested in having Mason, Conway, Shron and Mortar work on their systems must have their data in AWS or be able to move it there. The initial reference implementation will likely be AWS-centric, too, but Young hopes contributors will use it and share methods for running it atop other platforms.</p>
<p><em>Feature image of Hilary Mason at Structure: Data 2011 courtesy of Pinar Ozger (www.pinarozger.com).</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=643353&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=694403"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=694403" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=643353+why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=643353+why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=643353+why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=643353+why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/05/hilarymason.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/05/hilarymason.jpeg?w=150" medium="image">
			<media:title type="html">hilarymason</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/gernres-support-1.jpg?w=708" medium="image">
			<media:title type="html">They can get pretty complex, as evidence by this Netflix example.</media:title>
		</media:content>
	</item>
		<item>
		<title>How EMC&#8217;s CTO is trying to keep EMC, VMware and Pivotal orbiting the same sun</title>
		<link>http://gigaom.com/2013/05/07/how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun/</link>
		<comments>http://gigaom.com/2013/05/07/how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun/#comments</comments>
		<pubDate>Wed, 08 May 2013 01:17:09 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Pivotal]]></category>
		<category><![CDATA[software-defined data center]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=643152</guid>
		<description><![CDATA[EMC CTO John Roese has a tough, but important job trying to keep EMC, VMware and Pivotal all moving in the same direction. While the three are separate companies, their fates are also very much aligned.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=643152&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you&#8217;re confused about all the action with EMC, VMware and Pivotal over the past several months, you&#8217;re not alone. CEOs <a href="http://gigaom.com/2012/07/17/maritz-is-out-as-vmware-ceo-but-takes-strategic-role-at-emc/">have traded places,</a> joint ventures <a href="http://gigaom.com/2013/03/13/the-pivotal-initiative-in-case-you-were-wondering-is-now-official/">have been struck</a>, product lines <a href="http://gigaom.com/2013/05/01/vmware-garage-sale-continues-as-it-offloads-wavemaker-to-pramati/">have been sold</a> and GE <a href="http://gigaom.com/2013/04/24/ge-to-pour-105m-into-emc-and-vmwares-pivotal-initiative/">even came on board</a>. And that&#8217;s before you even start talking about all the new technology.</p>
<p>I sat down with EMC SVP and CTO John Roese on Tuesday at the company&#8217;s annual EMC World conference to find out what&#8217;s up. Here&#8217;s what he had to say.</p>
<h2 id="on-three-companies-under-one-r">On three companies under one roof</h2>
<p>While they&#8217;re technically three separate companies, EMC is really in control. It&#8217;s the majority shareholder in VMware and owns more than 60 percent of Pivotal, its new joint venture with VMware that includes the <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">Greenplum</a>, <a href="http://gigaom.com/2012/03/16/exclusive-emc-buys-pivotal-labs/">Pivotal Labs</a>, <a href="http://gigaom.com/2012/05/15/can-vmware-draw-developers-developers-developers/">SpringSource</a>, <a href="http://gigaom.com/2013/03/07/for-sale-from-pivotal-initiative-cloud-foundry/">Cloud Foundry</a> and <a href="http://gigaom.com/2012/04/24/vmware-buys-big-data-startup-cetas/">Cetas</a> business lines. When it comes to everyone working toward a common goal, Roese said, &#8220;The good news is that while there is independence, Joe Tucci is the chairman of all these companies.&#8221;</p>
<p>Roese calls himself the &#8220;gravitational center&#8221; of the three companies when it comes to technology. This is a reinvention of the CTO role at EMC, which used to be more of a research position. Now, he puts the stake in the ground and generally directs everyone toward it, even if they&#8217;re not all taking the same path to get there.</p>
<h2 id="on-why-pivotal-happened-and-wh">On why Pivotal happened and why it matters</h2>
<p>My takeaway from Roese&#8217;s comments on formation of Pivotal is that Greenplum is really the linchpin of the whole company. At its core, Pivotal is about building big data infrastructure <a href="http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/">that can handle next-generation workloads</a>, but it&#8217;s aware that broad adoption is only possible if that high technology becomes easier to consume. That means new higher-level applications, which is where SpringSource, Cloud Foundry and Pivotal Labs come into play.</p>
<p>All of this technically could have been accomplished by just selling Greenplum and Pivotal Labs (the only assets of the new company that was under the EMC umbrella) to VMware, but Roese said VMware wasn&#8217;t the right home because VMware is not so important in the places where next-generation workloads are popping up. There&#8217;s not a lot of VMware inside carriers&#8217; data centers, he acknowledged, but <a href="http://gigaom.com/2013/04/14/rackspace-wants-to-be-the-openstack-provider-to-the-stars/">there is a lot of OpenStack popping up</a>. And there&#8217;s a lot of Amazon Web Services everywhere you look.</p>
<p>&#8220;We would like the big data infrastructure to not care about that,&#8221; Roese explained. From EMC&#8217;s perspective, it doesn&#8217;t need to own the middle &#8212; the cloud operating system, if you will &#8212; if it can still engage customers at the storage and application-platform layers.</p>
<h2 id="on-keeping-independent-while-w">On keeping independent while working an &#8216;unfair advantage&#8217;</h2>
<p>Roese doesn&#8217;t think a vertically integrated approach is the best way to do business in today&#8217;s technology world, which is why EMC, VMware and Pivotal all operate independently and no one relies on another in order to work within customers&#8217; data centers. That&#8217;s why VMware <a href="http://gigaom.com/2013/03/13/vmwares-hybrid-vcloud-takes-on-amazon-kinda/">has its own cloud computing efforts</a> but Pivotal is cloud-agnostic, why EMC storage can operate with any higher-level software and why VMware doesn&#8217;t care about what&#8217;s running underneath or, usually, above it.</p>
<p>However, he added, it&#8217;s only natural the three companies seek an &#8220;unfair advantage&#8221; from the incestuous bonds they share. What he means, of course, is that they should keep a close eye on what the others are doing and work together to ensure they&#8217;re all optimized for the same types of workloads. For example, Roese said, if EMC didn&#8217;t reconsider how storage had to perform given that virtualization is the norm or that technology like Hadoop exists, it would &#8220;become suboptimal or generic.&#8221;</p>
<p>The same holds true for Pivotal and VMware. Pivotal needs to think about <a href="http://gigaom.com/2012/06/13/vmware-aims-for-hadoop-on-vms-with-serengeti-project/">how big data applications run on virtualized resources</a> differently than on big bare metal systems, as well as on flash-based arrays like what EMC is about to roll out based on its <a href="http://gigaom.com/2012/05/10/emc-goes-all-flash-buys-xtremio-for-430m/">XtremIO acquisition</a>. VMware and EMC need to think about how their <a href="http://gigaom.com/2013/03/13/vmware-to-virtualize-networks-with-software-incorporating-niciras-capabilities/">software-defined data center</a> and <a href="http://gigaom.com/2013/05/06/emc-plots-software-defined-data-center-journey-from-vipr-storage-virtualization-base/">software-defined storage</a> approaches can build off each other.</p>
<p>From EMC&#8217;s perspective, it&#8217;s easy to see why this all matters. It is at its core an information infrastructure company, but &#8220;the challenging thing with that is that it&#8217;s a moving target,&#8221; Roese said. A company like EMC can&#8217;t get by on storage arrays alone anymore, but it also can&#8217;t be dumb enough to think it can be everything to everyone and still be good at anything.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=643152&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=358704"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=358704" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=643152+how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=643152+how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=643152+how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/10/cloud-and-data-third-quarter-2012-analysis-and-outlook/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=643152+how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun&utm_content=dharrisstructure">Cloud and data third-quarter 2012</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/07/how-emcs-cto-is-trying-to-keep-emc-vmware-and-pivotal-orbiting-the-same-sun/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/john_roese_225.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/john_roese_225.jpg?w=150" medium="image">
			<media:title type="html">John_Roese_225</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Look, IBM is doing SQL on Hadoop, too</title>
		<link>http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/</link>
		<comments>http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/#comments</comments>
		<pubDate>Mon, 06 May 2013 17:37:41 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642523</guid>
		<description><![CDATA[IBM's entrant in the SQL-on-Hadoop competition has been flying under the radar, but is available as a technology preview. Called Big SQL, it's a big deal if IBM wants to be a major player in the Hadoop space.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642523&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Maybe this is just news to me, but IBM has a SQL-on-Hadoop product in the works called Big SQL. The company <a href="https://www.ibm.com/developerworks/community/blogs/SusanVisser/entry/introducing_the_ibm_big_sql_technology_preview1?lang=en">announced the technology preview version in March</a> (well under my radar and, from what I&#8217;ve seen, nearly everyone else&#8217;s radar), and is offering up a cloud-based demo environment for a select group of early users.</p>
<p>As a refresher, the big difference between SQL on Hadoop and the Hadoop connectors that were popular a couple years ago is that SQL-on-Hadoop products query the data where it resides &#8212; in HDFS or HBase &#8212; rather than pulling it into a relational database environment to analyze it. We have been <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">talking for months about the emergence of a large SQL-on-Hadoop market</a>, but IBM&#8217;s name was conspicuously absent from that discussion. The company has Hadoop software called BigInsights and lots of SQL expertise, so it only made sense that IBM would get into the game at some point.</p>
<p>Details on Big SQL are still pretty sparse save for a few high-level blog posts and an instructional video (embedded below), but it looks to take the standard approach, <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">as Cloudera is doing with Impala</a>, of enabling access through traditional tools via JDBC and ODBC drivers.</p>
<p>Ultimately, I think the advent of big data will <a href="http://gigaom.com/2013/05/01/precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data/">enable some new types of querying techniques</a> quite a bit different than the SQL queries we&#8217;ve come to know and love over the past couple decades. But SQL is still the language du jour and might never go away, so there&#8217;s a lot of value to be had if people can put their SQL skills to work on data stored inside Hadoop or other environments, and if companies can work toward a nirvana <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">where all the data is stored in a single place</a> rather than across database environments.</p>
<p>That IBM got this message and got into the game isn&#8217;t surprising at all, but it is important. Lots of large companies buy IBM&#8217;s software.  If it wants them to follow it into the world of big data and Hadoop, it has to give them the tools they need to use it.</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/DCWig4-h1F4?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642523&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=750533"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=750533" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=642523+look-ibm-is-doing-sql-on-hadoop-too&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/06/look-ibm-is-doing-sql-on-hadoop-too/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/02/shutterstock_37622056.jpg?w=150" medium="image">
			<media:title type="html">sql statement</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>MapR releases M7, its commercial HBase distro</title>
		<link>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/</link>
		<comments>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/#comments</comments>
		<pubDate>Wed, 01 May 2013 23:21:07 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=641425</guid>
		<description><![CDATA[MapR on Wednesday released its commercial version of HBase called M7, the first such product on the market, that the company claims is bigger, faster and better than the open source version.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>MapR didn&#8217;t miss the memo about the key to success in the Hadoop space being the creation of a data platform that can do many things. And on Wednesday, the company released its take on HBase, <a href="http://www.mapr.com/products/mapr-editions/m7-edition">called M7.</a></p>
<p>Last week, I <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">explained how HBase is fast becoming the star of the Hadoop ecosystem</a> because it allows users to build more real-time, almost transactional applications on top of Hadoop. True to its form with its other products, MapR has taken HBase even further with M7 by promising greater availability (99.999 percent), instant recovery, faster operations and the ability to handle 1 trillion tables in a single cluster. In open source versions of HBase, MapR VP of Marketing Jack Norris told me, the accepted table limit per cluster is several hundred.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/05/m7.jpg"><img  alt="m7" src="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300&#038;h=265" width="300" height="265" class="alignright size-medium wp-image-641471" /></a>Additionally, M7 shares a single data layer with the Hadoop file system, meaning less performance overhead and, presumably, easier management.</p>
<p>As we&#8217;re seeing with other Hadoop vendors, including Cloudera (which <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">released its Impala SQL query engine on Tuesday</a>), the Hadoop market is fast becoming one where each vendor is trying to set itself apart from the rest by building the best platform with the broadest set of capabilities. In furtherance of that mission, MapR also announced on Wednesday full-text search on its Hadoop distribution thanks to a partnership with Lucene specialist LucidWorks. It already has its own Hadoop distribution complete with proprietary code to bolster the file system and speed up MapReduce, as well as an <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">open source SQL-on-Hadoop project called Drill</a> in the works.</p>
<p>MapR employees are probably sleeping a lot easier these days as a result of this platform push. Others in the Hadoop market used to talk about the fear of fragmentation and then point at MapR as the example of a company helping foment that outcome with its proprietary software. Now, however, even if everyone else is building open source products, they&#8217;re all still backing their own and largely dismissing the others.</p>
<p>I suspect the result is feature lock-in even there&#8217;s no technological lock-in, kind of <a href="http://gigaom.com/2011/03/16/how-amazon-is-following-apples-lead-to-rule-cloud-computing/">like using Amazon Web Services for cloud computing</a> and then hoping to replicate its various servies elsewhere. It might be easy enough to move your data, but impossible or very difficult to replicate those additional capabilities elsewhere. If MapR can build a better version of HBase and companies are willing to pay for it, then so be it.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641425&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=29080"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=29080" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641425+mapr-releases-m7-its-commercial-hbase-distro&utm_content=dharrisstructure">Big data 2013: key trends and companies to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/01/mapr-releases-m7-its-commercial-hbase-distro/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_110961494.jpg?w=150" medium="image">
			<media:title type="html">Database rows</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/m7.jpg?w=300" medium="image">
			<media:title type="html">m7</media:title>
		</media:content>
	</item>
		<item>
		<title>Precog launches with a plan to simplify analytics on unstructured data</title>
		<link>http://gigaom.com/2013/05/01/precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data/</link>
		<comments>http://gigaom.com/2013/05/01/precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data/#comments</comments>
		<pubDate>Wed, 01 May 2013 18:50:32 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Precog]]></category>
		<category><![CDATA[unstructured data]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=641260</guid>
		<description><![CDATA[Analytics startup Precog is on a mission to make analytics on unstructured data as simple as possible with a new line of targeted appliances. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641260&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.precog.com/">Precog</a>, a Boulder, Colo.-based startup that&#8217;s trying to seed the market for advanced analytics on unstructured data, is coming out of beta on Thursday with a line of appliances designed to let everyday users get started on making sense of social, web and application data. The company&#8217;s underlying technology has remained the same <a href="http://gigaom.com/2012/09/27/startup-precog-says-big-data-doesnt-need-to-be-so-complex/">since we profiled Precog in September</a>, but a journey into the world outside Silicon Valley has changed its thinking about how to market and deliver its product.</p>
<p>Put simply, Precog&#8217;s technology lets users ask questions of their unstructured data (e.g., stuff sitting in Hadoop, MongoDB or any other non-relational data store) in whatever format it was created &#8212; JSON, logfile, XML, what have you. This is different from the standard operating procedure of querying unstructured data &#8212; including <a href="http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/">the current SQL-on-Hadoop craze</a> &#8212; which usually involves somehow transforming data into a format that a relational engine can read before beginning the analysis. Precog also features visualizations, charts and reports designed with these new types of data, and presumably larger datasets, in mind.</p>
<div id="attachment_641294" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/de-goes-and-carr.jpg"><img  alt="CEO John De Goes and COO Jeff Carr" src="http://gigaom2.files.wordpress.com/2013/05/de-goes-and-carr.jpg?w=708"   class="size-full wp-image-641294" /></a><p class="wp-caption-text">CEO John De Goes and COO Jeff Carr</p></div>
<p>However, Founder and CEO John De Goes told me, the company came to realize over the past several months that as much as what it&#8217;s doing might fall under the &#8220;data science&#8221; umbrella, that&#8217;s the wrong messaging. Outside of Silicon Valley, he said, &#8220;a lot of companies don&#8217;t have the technological sophistication to understand the whole data science thing&#8221; &#8212; they just want to know that they can ask deeper questions of the new data types they&#8217;re storing in their NoSQL databases without having to perform ETL operations on it or write a lot of complicated code.</p>
<p>And the bigger those companies are, Precog COO Jeff Carr said, the less likely they are to want a cloud service like Precog initially offered.</p>
<p>So the company took both lessons to heart and is rolling out a line of appliances (physical or virtual) that complement its flagship cloud service, each targeting specific use cases. The first three are social media, web analytics and application data, and the appliances are equipped with baked-in capabilities important to each of those fields. The social media one, for example, will feature advanced sentiment analysis and natural language processing, while the web analytics one will focus on features such as behavioral clustering.</p>
<p>Under the covers, though, each appliance still runs on the broader Precog platform, Carr noted, and someone who buys one just to get started in a specific area can pretty easily (i.e., without reaching &#8220;super-coder&#8221; status) turn it toward other data types and other types of analysis. But right now, De Goes added, no one really knows what it means to have an analytics product designed for unstructured data, so the appliance approach should make it easier for large enterprises and non-tech companies to digest.</p>
<div id="attachment_641292" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/precog-web.jpg"><img  alt="precog web" src="http://gigaom2.files.wordpress.com/2013/05/precog-web.jpg?w=300&#038;h=192" width="300" height="192" class="size-medium wp-image-641292" /></a><p class="wp-caption-text">An example of web analytics in Precog.</p></div>
<p>It&#8217;s a &#8220;baby steps&#8221; situation, explained Carr: &#8220;Don&#8217;t sit there and try to think about how to solve every problem all at once. Let&#8217;s try to sit there and think about data types you know you&#8217;re having problems with [now].&#8221;</p>
<p>Analyzing data in its native format has advantages beyond just omitting an extra transformation step, though, and the Precog team thinks companies will get hip to these advantages as they begin to understand the analytic aspects of non-relational databases as well as they do the operational aspects. Often times, these will be new use cases, which is why Precog considers itself more complementary to than competitive with traditional data warehouses, SQL-on-Hadoop tools and BI software.</p>
<p>One early customer is using Precog to match up résumé data &#8212; often enhanced résumé data &#8212; with job openings, which is a tricky proposition in a relational format because résumés can include so much personalized information or content that doesn&#8217;t fit into a schema at all, really. Another user, a large telco, is trying to build new data products for its customers by mashing together all sorts of internal and third-party data in numerous formats.</p>
<p>Carr compared the shift to the shift from just flat files to relational data decades ago. &#8220;It&#8217;s happening again,&#8221; he said. &#8220;It has to happen again &#8230; people are not going to abandon JSON because it does&#8217;t fit neatly inside a table.&#8221;</p>
<p>Precog is telling the right story around why unstructured analytics matters, but one has to assume there will be a major shakeout in the big data analytics space over the next few years. There are only so many new technologies companies can absorb at once &#8212; Hadoop, NoSQL, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">SQL on Hadoop</a>, unstructured analytics, <a href="http://gigaom.com/2013/03/26/white-hot-bi-on-hadoop-startup-platfora-now-ga/">Platfora</a>, in-memory, stream processing, <a href="http://gigaom.com/2013/02/19/citusdb-today-sql-on-hadoop-tomorrow-the-world/">next-gen analytic databases</a>, etc. &#8212; and it&#8217;s hard to predict which messages and capabilities will win out.</p>
<p>However, unless Hadoop really does become the lone dumping ground for <em>all </em>non-operational data &#8212; regardless the source &#8212; technologies like Precog that can act as the analytics layer across numerous data stores would seem to have an advantage.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=641260&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=403397"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=403397" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641260+precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641260+precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641260+precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=641260+precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/01/precog-launches-with-a-plan-to-simplify-analytics-on-unstructured-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/precog-web.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/precog-web.jpg?w=150" medium="image">
			<media:title type="html">precog web</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/de-goes-and-carr.jpg" medium="image">
			<media:title type="html">CEO John De Goes and COO Jeff Carr</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/precog-web.jpg?w=300" medium="image">
			<media:title type="html">precog web</media:title>
		</media:content>
	</item>
		<item>
		<title>With Impala now GA, Cloudera&#8217;s CEO sizes up the SQL-on-Hadoop market</title>
		<link>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/</link>
		<comments>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 13:00:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Impala]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL on Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=640777</guid>
		<description><![CDATA[Cloudera's Impala engine for interactive SQL queries on Hadoop data is now generally available, and CEO Mike Olson gives his lay of the competitive landscape.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is no shortage of confidence in the Hadoop space, and market leader Cloudera bolstered its own on Tuesday with the general availability of its Impala SQL query engine for Hadoop. And if CEO Mike Olson&#8217;s comments are any indication, we&#8217;re in for a long ride of competitive jockeying and oneupmanship as Cloudera and its peers go all Microsoft or Google and create myriad new data-processing engines to turn their Hadoop distributions into bona fide platforms.</p>
<p>Launched as a private beta in May 2012 and <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">made public in October</a>, Impala is Cloudera&#8217;s attempt to address the growing demand for interactive SQL analytics on Hadoop data. It&#8217;s essentially a massively parallel database designed to share the same storage platform and metadata as Hadoop MapReduce, only it is its own separate processing engine.</p>
<div id="attachment_640848" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg"><img  alt="How Impala fits in" src="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300&#038;h=257" width="300" height="257" class="size-medium wp-image-640848" /></a><p class="wp-caption-text">How Impala fits in</p></div>
<p>Impala actually uses the same &#8220;nearly ANSI&#8221; version of SQL as does current standard bearer Hive, but that technology (created by Facebook in 2009 as a data warehouse layer for Hadoop) doesn&#8217;t run nearly fast enough to sate many users&#8217; desire for interactive analytics. This is because Hive transforms SQL queries into MapReduce jobs, meaning every one is processed against the entire corpus of data in the Hadoop Distributed File System.</p>
<h2 id="sizing-up-the-competition">Sizing up the competition</h2>
<p>Only Cloudera isn&#8217;t the first to have the idea, <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">nor is it alone in trying to sell interactive SQL on Hadoop</a>. The idea was <a href="http://gigaom.com/2011/10/21/hadapt-raises-9-5m-for-hadoop-data-warehouse/">first commercialized by Boston-based startup Hadapt</a> in 2011, and is now being pushed by numerous startups and larger Hadoop players. Among them: Pivotal (formerly EMC) Greenplum, MapR (with <a href="http://gigaom.com/2012/08/17/for-fast-interactive-hadoop-queries-drill-may-be-the-answer/">Drill</a>), Hortonworks (with <a href="http://hortonworks.com/blog/100x-faster-hive/">Stinger</a>), Drawn to Scale, Splice Machine, Jethro Data and Citus Data.</p>
<div id="attachment_640858" class="wp-caption aligncenter" style="width: 600px"><a href="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg"><img  alt="Hadapt's architecture" src="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg?w=708"   class="size-full wp-image-640858" /></a><p class="wp-caption-text">Hadapt&#8217;s architecture</p></div>
<p>But Cloudera is arguably the biggest name pushing SQL on Hadoop, and CEO Mike Olson thinks Impala stands out for several reasons &#8212; not the least of which is that it exists as a product. &#8220;Nobody else is shipping production-grade SQL query support on Hadoop,&#8221; he told me during a recent call. &#8220;At least not in open source.&#8221; He seems content to let the startups do their things, instead focusing his attention on Cloudera&#8217;s big three Hadoop-distribution competitors in Pivotal, MapR and Hortonworks. Greenplum and Pivotal SVP Scott Yara <a href="http://gigaom.com/2013/02/25/emc-to-hadoop-competition-see-ya-wouldnt-wanna-be-ya/">was full of confidence &#8212; and R&amp;D budget</a>&#8211; when the company announced the Pivotal HD distribution and HAWQ technology in February, but Olson claims the approach requires a siloed DBMS within HDFS and is a &#8220;rearguard defensive strategy&#8221; to protect the company&#8217;s sunk costs in its database technology.</p>
<div id="attachment_615210" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg"><img  alt="The Pivotal HD and Hawq architecture" src="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708&#038;h=387" width="708" height="387" class="size-large wp-image-615210" /></a><p class="wp-caption-text">The Pivotal HD and Hawq architecture</p></div>
<p>As for Hortonworks, Olson questions the wisdom of its Stinger initiative to boost Hive&#8217;s speed, noting that &#8220;Hive never got good while it was running standalone on MapReduce.&#8221; Hortonworks also <a href="http://gigaom.com/2013/04/15/teradata-to-connect-hadoop-and-data-warehouses-roll-out-new-appliance/">partners with vendors such as Teradata</a> to let their platforms access Hadoop data in its native format, but those approaches still require sending data over the network. &#8220;It&#8217;s not the way you would build it if you woke up in the 2000s and were building this anew,&#8221; Olson said.</p>
<div id="attachment_640854" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png"><img  alt="The Stinger roadmap" src="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708&#038;h=558" width="708" height="558" class="size-large wp-image-640854" /></a><p class="wp-caption-text">The Stinger roadmap</p></div>
<p>Olson acknowledged that the MapR-led Apache Drill project is cut from the same cloth as Impala (that is, being a Google Dremel clone designed specifically for Hadoop), but &#8220;the difference is we&#8217;re shipping code.&#8221; Being generally available and ready for production workloads means Cloudera can lock down users and market share before many even have a chance to experiment with Drill. He all but dismissed questions over the readiness of Impala, spurred by rumblings in the Hadoop space that Cloudera rushed it into public beta in order to get on the scoreboard against more fully baked offerings.</p>
<p>&#8220;I don&#8217;t feel we&#8217;re under the gun competitively to pull it out of beta because no one else has product in the market,&#8221; Olson said. &#8220;I have no problems &#8230; calling this GA quality.&#8221; He did, however, acknowledge that Impala is shipping with a &#8220;minium viable feature set&#8221; that the company has plans to build on in the near future. Impala Senior Product Manager Justin Erickson noted a few issues of concern, including around the number of concurrent users Impala can support, but said they have been addressed during the beta period.</p>
<h2 id="one-piece-of-a-larger-platform">One piece of a larger platform</h2>
<p>Really, though, the whole point of Impala and its competitors is to turn Hadoop from a tool for batch analytics and mass storage <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into a platform that can handle nearly all of companies&#8217; data-processing needs</a>. In that regard, it appears we&#8217;re just getting started. Cloudera, MapR, Pivotal Greenplum and Hortonworks are already pushing their own products and projects, and Olson said &#8220;it&#8217;s absolutely our intent&#8221; to enhance Cloudera&#8217;s platform with even more open-source products &#8212; perhaps even more database technologies <a href="http://gigaom.com/2013/04/22/how-hbase-converted-myspaces-mysql-champion-and-is-driving-hadoop-mainstream/">a la HBase</a> &#8212; that will let users do more stuff with more types of data. Over time, this strategy could result in Hadoop displacing the current breed of databases and data warehouses and becoming the single data store atop of which users run whatever applications they so desire. For now, though, especially when it comes to Impala and the data warehouse incumbents, Olson is taking a measured approach. &#8220;The likelihood that we&#8217;re going to knock them off in the near term,&#8221; he said, &#8220;&#8230; it would be a tough fight to win.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=640777&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=883977"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=883977" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=640777+with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/30/with-impala-now-ga-clouderas-ceo-sizes-up-the-sql-on-hadoop-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o1503.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2012: Michael Olson – CEO, Cloudera</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/impala-arch-new.jpg?w=300" medium="image">
			<media:title type="html">How Impala fits in</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/had_graphic2-scaled.jpg" medium="image">
			<media:title type="html">Hadapt&#039;s architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/hawq1.jpg?w=708" medium="image">
			<media:title type="html">The Pivotal HD and Hawq architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/stingerroad.png?w=708" medium="image">
			<media:title type="html">The Stinger roadmap</media:title>
		</media:content>
	</item>
		<item>
		<title>The growing importance of timing in data centers</title>
		<link>http://gigaom.com/2013/04/28/the-growing-importance-of-timing-in-data-centers/</link>
		<comments>http://gigaom.com/2013/04/28/the-growing-importance-of-timing-in-data-centers/#comments</comments>
		<pubDate>Sun, 28 Apr 2013 18:00:45 +0000</pubDate>
		<dc:creator>Jim Theodoras, ADVA Optical Networking</dc:creator>
				<category><![CDATA[ADVA]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Jim Theodoras]]></category>
		<category><![CDATA[Spanner]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[timing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=634743</guid>
		<description><![CDATA[Accurate timing has grown more important in distributed systems, not just for mobile networks, but also for tracking data between data centers. Our love of digital junk is pushing storage to the edge.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=634743&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em><strong>Editor&#8217;s note</strong>: This is the second of a two-part series on the importance of timing in today&#8217;s distributed infrastructures. The <a href="http://gigaom.com/2013/04/27/timing-is-not-just-for-traders-anymore-networks-need-it-too/">first</a> ran on Saturday.</em></p>
<p>Like a bad episode of <a href="http://www.aetv.com/hoarders/"><em>Hoarders</em></a>, people love to store all things digital, most of which will never be accessed again. And, like a bad episode of <a href="http://www.aetv.com/storage-wars/"><em>Storage Wars</em></a>, our love of storing crap means we need more places to store it. Today’s content has outgrown even the hydro-electric dam powered Mega Data Centers built just yesteryear. Increasingly, operators are turning to distributing their information across multiple geographically dispersed data centers. As the number, size, and distances between the data centers have steadily grown, timing distribution and accuracy has likewise grown in importance in keeping the data centers in sync.</p>
<p>In a <a href="http://gigaom.com/2013/04/27/timing-is-not-just-for-traders-anymore-networks-need-it-too/">previous article</a> I discussed new standards being developed to increase the accuracy of timing for the internet and other IP-based networks. Current systems and protocols offer milliseconds of accuracy. But that just isn’t enough as we depend more on real-time information and compute, storage and communications networks become more distributed. While people often cite the importance of timing on mobile backhaul networks in the next-genration LTE-Advanced networks,there has been less publicity around the need for these new timing technologies in the continued growth of data centers. </p>
<h2 id="the-rise-of-hadoop-in-an-age-o">The rise of Hadoop in an age of digital garbage</h2>
<p><a href="http://gigaom2.files.wordpress.com/2011/12/13250237_1a49b5a7a3_z.png"><img src="http://gigaom2.files.wordpress.com/2011/12/13250237_1a49b5a7a3_z.png?w=708" alt="Dinosaurs"    class="aligncenter size-full wp-image-459351" /></a><br />
Massive storage of data appears to occur in periods, very analogous to <a href="http://dinosaurs.about.com/od/dinosaurbasics/a/dinosaurages.htm">dinosaur evolution</a>. A database architecture will rise to the forefront, based upon its advantages, until it scales to the breaking point and is completely superseded by a new architecture. At first, databases were simply serial listed values with row/column arrangements. Database technology leapt forward and became a self-sufficient business with the advent of relational databases. It appeared for a while <a href="http://computer.howstuffworks.com/question599.htm">relational databases</a> would be the end word in information storage, but then came Web 2.0, social media, and <a href="http://en.wikipedia.org/wiki/Cloud_computing">the cloud</a>. Enter Hadoop.</p>
<p>A centralized database works, as the name suggests, by having all the data located in a single indexed repository with massive computational power to run operations on it. But a centralized database cannot hope to scale to the size needed by today’s cloud apps. Even if it could, the time needed to perform a single lookup would be unbearable to an end user at a browser window. </p>
<p><a href="http://strata.oreilly.com/2011/01/what-is-hadoop.html">Hadoop de-centralizes the storage</a> and lookup, as well as computational power. There is no index, per se. Content is distributed across a wide array of servers, each with their own storage and CPU’s, and the location and relation of each piece of data mapped. When a lookup occurs, the map is read, and all the pieces of information are fetched and pieced together again. The main benefit of Hadoop is scalability. To grow a database (and computational power), you simply keep adding servers and growing your map.</p>
<h2 id="even-hadoop-is-buried-under-mo">Even Hadoop is buried under mounds of digital debris </h2>
<p><a href="http://gigaom2.files.wordpress.com/2013/04/hadoop-timing.jpg"><img src="http://gigaom2.files.wordpress.com/2013/04/hadoop-timing.jpg?w=708&#038;h=364" alt="hadoop timing" width="708" height="364"  class="aligncenter size-full wp-image-634756" /></a><br />
It looked like Hadoop would reign supreme for generations to come, with extensions continuously breathing new life into the protocol. Yet, after only a decade, databases based upon Hadoop such as Facebook are at the breaking point. Global traffic is growing beyond exponential, and most of it is trash. Today’s databases look more like landfills than the great <a href="http://starwars.wikia.com/wiki/Jedi_Archives">Jedi Archives</a>. And recently hyped trends such as <a href="https://www.facebook.com/lifeboxapp">lifelogging</a> suggest the problem will get much worse long before it gets better. </p>
<p>The main limitation of Hadoop is that it works great within the walls of a single massive data center, but is less than stellar once that database outgrows the walls of a single data center and has to be run across geographically separated databases. It turns out the main strength of Hadoop is also its Achilles heel. With no index to search, every piece of data must be sorted through, a difficult proposition once databases stretch across the globe. A piece of retrieved data might be stale by the time it reaches a requester, or mirrored copies of data might conflict with one another.</p>
<p>Enter an idea keep widely dispersed data centers in sync &#8212; <a href="http://gigaom.com/2012/09/17/googles-spanner-a-database-that-knows-what-time-it-is/">Google True Time</a>. To grossly oversimplify the concept, True Time API adds time attributes to data being stored, not just for expiration dating, but also so that all the geographically disparate data centers’ content can be time aligned. For database aficionados, this is sacrilegious, as all leading database protocols are specifically designed to ignore time to prevent conflicts and confusion. Google True Time completely turns the concept of data storage inside out.</p>
<h2 id="introducing-spanner">Introducing Spanner </h2>
<p>In True Time, knowing the accurate “age” of each piece of information, in other words where it falls on the timeline of data, allows data centers that may be 100ms apart to synchronize not just the values stored in memory locations, but the timeline of values in memory locations. In order for this to work, Google maintains an accurate “global wall-clock time” across their entire global Spanner network. </p>
<p>Transactions that write are time stamped and use strict two phase locking (<a href="http://en.wikipedia.org/wiki/Two-phase_locking">S2PL</a>) to manage access. The commit order is always the timestamp order. Both commit and timestamp orders respect global wall-clock time. This simple set of rules maintains coordination between databases all over the world. </p>
<p>However, there is an element of uncertainty introduced into each data field, the very reason that time has been shunned in database protocols since the dawn of the data itself. </p>
<p><a href="http://webworkerdaily.files.wordpress.com/2010/02/clocktower.jpg"><img src="http://webworkerdaily.files.wordpress.com/2010/02/clocktower.jpg?w=708" alt="clocktower"    class="aligncenter size-full wp-image-239761" /></a></p>
<p>Google calls this “network-induced uncertainty”, denoted with an epsilon, and actively monitors and tracks this metric. As of summer 2012, this value was running 10ms for 99.9 percent (3 nines) certainty. Google’s long term goal is to reduce this below 1ms. Accomplishing this will require a state of the art timing distribution network, leveraging the same technologies being developed and deployed for 4G LTE backhaul networks.</p>
<h2 id="a-modest-proposal">A modest proposal </h2>
<p>While True Time was most likely developed to improve geographic load balancing, now that accurate time stamping of data exists, the possibilities are profound. The problems associated with large databases go beyond simply managing the data. The growth rate itself is unsustainable. Data storage providers must do more than grow their storage, they must also come up with ways to improve efficiencies and ebb the tsunami of waste that is common in the age of relatively free storage.</p>
<p>It&#8217;s a dangerous notion, one simply must challenge the basic tenet that all data is forever. Our minds don’t work that way, why should computers? We only hold on to key memories, and the further the time from an event, the fewer the details are held. Perhaps data storage could work similarly. Rather than delete a picture that hasn’t been accessed in a while, a search is performed for similar photos and then only one kept. And as time passes, perhaps rather than simple deletion, a photo is continuously compressed, with less information kept, until the photo memory fades into oblivion. Like that old <a href="http://en.wikipedia.org/wiki/Instant_camera">Polaroid</a> hung on the refrigerator door.</p>
<p><em>Jim Theodoras is director of technical marketing at ADVA Optical Networking, working on Optical+Ethernet transport products. </em></p>
<p><em> Dinosaur image courtesy of <a href="http://www.flickr.com/photos/denn/13250237/">Flickr user Denise Chen</a>. </em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=634743&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=198576"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=198576" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=634743+the-growing-importance-of-timing-in-data-centers&utm_content=gigaguest">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=634743+the-growing-importance-of-timing-in-data-centers&utm_content=gigaguest">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=634743+the-growing-importance-of-timing-in-data-centers&utm_content=gigaguest">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2010/12/9-companies-that-pushed-the-infrastructure-discussion-in-2010/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=634743+the-growing-importance-of-timing-in-data-centers&utm_content=gigaguest">9 Companies that Pushed the Infrastructure Discussion in 2010</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/28/the-growing-importance-of-timing-in-data-centers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2010/11/clock-prime-time.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2010/11/clock-prime-time.jpg?w=150" medium="image">
			<media:title type="html">clock prime time</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4411542bbd7a2a9a2fc2a1b38809e45c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaguest</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/12/13250237_1a49b5a7a3_z.png" medium="image">
			<media:title type="html">Dinosaurs</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/hadoop-timing.jpg" medium="image">
			<media:title type="html">hadoop timing</media:title>
		</media:content>

		<media:content url="http://webworkerdaily.files.wordpress.com/2010/02/clocktower.jpg" medium="image">
			<media:title type="html">clocktower</media:title>
		</media:content>
	</item>
		<item>
		<title>How data is changing the car game for Ford</title>
		<link>http://gigaom.com/2013/04/26/how-data-is-changing-the-car-game-for-ford/</link>
		<comments>http://gigaom.com/2013/04/26/how-data-is-changing-the-car-game-for-ford/#comments</comments>
		<pubDate>Fri, 26 Apr 2013 21:49:31 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Ford Motor]]></category>
		<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=633959</guid>
		<description><![CDATA[The advent of big data is affecting Ford Motor Co. in some significant ways, from how it analyzes its supply chain to the features it puts into its cars.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=633959&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>When most people think about how cars are built, they probably think about assembly lines, manufacturing robots, and batteries of safety and performance simulations on massive supercomputers. But at Ford, big data is having a significant impact on the parts and features of those cars before they&#8217;re ever part of a design file. From the cars in stock at the dealership to the performance of the engine in a rainstorm, big data is infiltrating nearly every aspect of the Ford experience and the company itself.</p>
<p>Obviously, data is nothing new to the automotive industry &#8212; companies have been trying to optimize supply chains and analyze sales numbers for decades &#8212; but the advent of big data, as well as related technlogies such as sensors and smartphones, is changing how companies are thinking about data. Ford isn&#8217;t alone in its quest to take advantage of these new technologies, either. For example, General Motors <a href="http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Outlook-How-Big-Data-can-fuel-bigger-growth-Strategy.pdf">collects data from its OnStar system</a> to help lower drivers&#8217; insurance premiums, and also collects lots of data on its Chevrolet Volt electric car that it <a href="http://gigaom.com/2013/01/20/chevy-volt-to-my-smartphone-you-complete-me/">feeds to drivers via a mobile app</a>. We recently noted how a luxury automobile company <a href="http://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehouses-youve-ever-seen/">used big data software from Aster Data Systems</a> to determine the relationships between malfunctions so it could provide a more thorough and beneficial service-department experience.</p>
<p>But in an industry notoriously unwilling to talk about information technology, Ford&#8217;s experiences might shed a lot on what other companies are thinking and doing, as well.</p>
<h2 id="building-a-better-experience-t">Building a better experience through data</h2>
<p>According to John Ginder, manager for systems analytics with Ford Research &amp; Innovation, the company has been doing advanced business modeling for about 20 years, but big data is something else. Today&#8217;s technologies are allowing Ford to handle larger, more-diverse datasets than ever before possible, and its efforts are already beginning to bear fruit in numerous places &#8212; including in the cars themselves.</p>
<p>The most obvious example of data influencing the driving experience might be the types of data car companies are actually giving back to drivers. At Ford, its Energi line of plug-in hybrid cars generate 25 gigabytes of data per hour that&#8217;s then processed and given back to drivers <a href="http://media.ford.com/images/10031/MyFord_Mobile.pdf">via a mobile app</a>. It tells them about battery life, the nearest charging stations and other data about the vehicle&#8217;s performance.</p>
<div id="attachment_635022" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/04/ford-energi.jpg"><img  alt="The MyFord mobile app architecture." src="http://gigaom2.files.wordpress.com/2013/04/ford-energi.jpg?w=708&#038;h=229" width="708" height="229" class="size-large wp-image-635022" /></a><p class="wp-caption-text">The MyFord mobile app architecture.</p></div>
<p>Ginder said all that data is the result of a &#8220;convergence of need and opportunity.&#8221; The opportunity is a way to experiment with collecting and presenting vehicle data on a group of early adopters that&#8217;s probably more interested in this type of advanced technology. The need has to do with what Ginder calls &#8220;range anxiety&#8221; &#8212; when drivers are getting used to electric vehicles, they need reassurance they&#8217;re not going to run out juice.</p>
<p>However, Ginder said, the company is just scratching the surface of what&#8217;s possible, because there aren&#8217;t that many of the electric vehicles on the road yet. The goal is to better understand how drivers are using the vehicles and use that information to continuously improve the vehicles and the overall experience. Ford&#8217;s Super Duty line of pickup trucks also offers a <a href="http://crewchief.telogis.com/how-it-works/">&#8220;crew chief&#8221; package</a> that lets bosses monitor the fuel consumption, engine performance and other data about their fleets of vehicles.</p>
<p>Mike Cavaretta, technical leader for predictive analytics and data mining with Ford Research &amp; Innovation, added that Ford is really interested in collecting more data from more vehicles, but noted there&#8217;s also a privacy concern that could come into play. The potential of someone knowing where and how you&#8217;re driving might not appeal to the mainstream just yet (just look at all that data Tesla collects about its cars <a href="http://gigaom.com/2013/02/14/five-important-lessons-from-the-dustup-over-the-nyts-tesla-test-drive/">and can present if it really wants to</a>), but as with the Energi, data does present some opportunities to improve the customer experience.</p>
<p>The test cars in Ford&#8217;s research labs are collecting about 250 gigabytes of data per hour from high-resolution cameras and an array of sensors, Cavaretta noted, and the company is trying to find out what data is most useful and how it might be rolled into production vehicles.</p>
<h2 id="building-betters-cars-through-">Building betters cars through data</h2>
<p>Of course, sometimes the best data isn&#8217;t the stuff you see, but the stuff that just makes your car better. Cavaretta said Ford analyzes a lot of social media and other external data in order to figure out, for example, what customers are saying about their vehicles compared with other makes and what problems they&#8217;re having.</p>
<div id="attachment_635027" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/esp13_feat_technology_liftgate.jpg"><img  alt="Opens with the touch of a foot. Source: Ford" src="http://gigaom2.files.wordpress.com/2013/04/esp13_feat_technology_liftgate-e1367011341438.jpg?w=300&#038;h=132" width="300" height="132" class="size-medium wp-image-635027" /></a><p class="wp-caption-text">Opens with the touch of a foot. Source: Ford</p></div>
<p>In one recent case, the product development team was curious as to whether the Ford Escape sport-utility vehicle should have a standard liftgate (i.e., it opens manually and the rear window can flip open) or a power liftgate in which the glass and the gate are one piece. In the latter option, the gate opens automatically by tapping under the rear bumper with your foot, but the window doesn&#8217;t open at all. Regular surveys hadn&#8217;t addressed the question, so Cavaretta and his team took to social media, where people were actually talking about it quite a bit and seemed to heavily favor the power liftgate in most cases. It&#8217;s now a feature.</p>
<p>Back in 2004, Ford <a href="http://www.theinquirer.net/inquirer/news/1015284/aston-martin-gets-neural-network">built a self-learning neural network system</a> for its Aston Martin luxury brand that maintains proper engine function by recognizing engine misfires and particular driving conditions and adjusting warnings and performance accordingly.</p>
<p>Ginder said his team has been improving on that technology ever since and actually expanded its use into a system, called Smart Inventory Management System, that lets dealers ensure they have the optimal stock of vehicles and features on their lots. Historically, he said, some dealers were very sophisticated about inventory management, while others were more reactionary (&#8220;They just sold a red Mustang,&#8221; he joked, &#8220;so they think they need to go order another red Mustang.&#8221;) With SIMS, all sorts of data about vehicle sales and other locally relevant data from across the country is aggregated in Ford&#8217;s big data platform, and the neural network algorithms learn the current patterns so Ford can make better recommendations &#8212; whether or not dealers choose to heed the advice.</p>
<h2 id="selling-big-data-internally">Selling big data internally</h2>
<p>Cavaretta characterizes the division in which he and Ginder work as &#8220;an Ernst &amp; Young, but just for Ford,&#8221; an internal consultancy (as opposed to Ford&#8217;s more-traditional research and development division) in charge of solving business problems via analytics. About 80 percent of those problems come directly from those lines of business, while about 20 percent are the research division&#8217;s own ideas. However, although he&#8217;s excited about how big data can help his team answer these questions in novel ways, it&#8217;s not always an easy sell with other parts of the company.</p>
<p>Mashing up data sources such as social and sales in order to find insights is a pretty easy sell, Cavaretta explained, but getting people to put sensors in everything and collect data every second or with every transaction can still be a bit challenging. In part, this is just a lingering effect of the constraints that legacy technologies imposed on the company. It wasn&#8217;t possible to store all this data, so people just got accustomed to the status quo of summarizing data hourly, for example.</p>
<div id="attachment_635020" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/map_skv_9439.jpg"><img  alt="Source: Ford" src="http://gigaom2.files.wordpress.com/2013/04/map_skv_9439.jpg?w=300&#038;h=215" width="300" height="215" class="size-medium wp-image-635020" /></a><p class="wp-caption-text">Source: Ford</p></div>
<p>Now, however, he&#8217;s pushing them to &#8220;dial it down&#8221; and collect data at the lowest level possible and as often as possible. In manufacturing alone, he explained, there are between 20,000 and 25,000 parts in any given vehicle, and there&#8217;s a supply chain that spans from parts suppliers all the way up to dealerships. Getting a complete view of this process could help drive serious efficiencies and, Cavaretta said, &#8220;We don&#8217;t see anything but big data technologies that can get us there.&#8221;</p>
<p>Other areas where Ford is collecting, or wants to collect, more real-time data is from websites, call centers and the company&#8217;s credit-processing arm, he added.</p>
<h2 id="building-big-data-internally">Building big data internally</h2>
<p>In order to accomplish their lofty goals, the Research &amp; Innovation analytics team relies heavily on open source technologies, most prominently Hadoop. However, Cavaretta said, they&#8217;ve been experimenting with a variety of natural-language processing tools, too, and even did a proof-of-concept with SAP&#8217;s HANA in-memory analytic database. The NLP tools were first turned on text analysis of internal surveys and dealer network documents, but now are used pretty heavily on social media and other web data.</p>
<p>Their team has some systems numbering in the dozens of nodes in its own building, but on weekends it&#8217;s able to borrow high-performance computing cycles from Ford&#8217;s Numerically Intensive Computing Center next door in order to model recommendation engines and other tasks that demand serious computing power.</p>
<p>But as a part of a specialized research division, the work that Ginder, Cavaretta and their team do on everything from Hadoop to visualization with tools like Tableau isn&#8217;t automatically ready for primetime. In fact, Cavaretta said, it looks at &#8220;what&#8217;s the art of the possible&#8221; and tries to show the value of it. It&#8217;s like a vanguard, he added, going out and seeing what&#8217;s ahead and then reporting back.</p>
<p>At that point, projects are often handed off to Ford&#8217;s central IT team that actually puts the technologies into production. A system that took the research team weeks to deploy and start deriving insights from might take IT months to make production-ready. However, Ginder added, his team can&#8217;t just throw stuff over the wall and abandon it &#8212; it has to collaborate with the IT team and individual departments throughout the project&#8217;s lifecycle.</p>
<p>An important part of this cross-company relationship &#8212; and <a href="http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one/">something many CIOs have likely heard before</a> &#8212; is having data scientists on board that can see the world through the eyes of both technologists and businesspeople, two groups that often have different concerns and goals in mind. &#8220;We look for people who can bridge those worlds,&#8221; Ginder said. &#8220;It&#8217;s hard to find these people, but they&#8217;re hugely important to organizations.&#8221;</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-53023p1.html">Shutterstock user PhotoSmart</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=633959&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=613923"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=613923" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633959+how-data-is-changing-the-car-game-for-ford&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633959+how-data-is-changing-the-car-game-for-ford&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633959+how-data-is-changing-the-car-game-for-ford&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633959+how-data-is-changing-the-car-game-for-ford&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/26/how-data-is-changing-the-car-game-for-ford/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_612924.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_612924.jpg?w=150" medium="image">
			<media:title type="html">car and disk drive</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/ford-energi.jpg?w=708" medium="image">
			<media:title type="html">The MyFord mobile app architecture.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/esp13_feat_technology_liftgate-e1367011341438.jpg?w=300" medium="image">
			<media:title type="html">Opens with the touch of a foot. Source: Ford</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/map_skv_9439.jpg?w=300" medium="image">
			<media:title type="html">Source: Ford</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop startup Qubole raises $7M for Hive as a Service</title>
		<link>http://gigaom.com/2013/04/23/hadoop-startup-qubole-raises-7m-for-hive-as-a-service/</link>
		<comments>http://gigaom.com/2013/04/23/hadoop-startup-qubole-raises-7m-for-hive-as-a-service/#comments</comments>
		<pubDate>Tue, 23 Apr 2013 12:00:32 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon S3]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Qubole]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=633392</guid>
		<description><![CDATA[Hadoop experts Qubole have just closed a Series A funding round for their service, which lets users run Hive data warehouse jobs in Amazon's cloud. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=633392&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.qubole.com">Qubole</a>, the startup from former Facebook engineers Ashish Thusoo and Joydeep Sen Sarma,  just closed a Series A investment round for its service, which lets users run a variety Hadoop jobs &#8212; including <a href="hive.apache.org">Hive</a>, MapReduce and Pig &#8212; in the Amazon Web Services cloud. Hive is the data warehouse system and SQL-like language for Hadoop that Thusoo and Sen Sarma <a href="http://infolab.stanford.edu/~ragho/hive-icde2010.pdf">helped create while at the social-networking company</a>. Charles River Ventures and Lightspeed Ventures led the round, which brings the company&#8217;s total venture capital investment to $7 million, including its seed round in late 2011.</p>
<p>Qubole <a href="http://gigaom.com/2012/06/06/exclusive-the-brains-behind-hive-launch-on-demand-hadoop-service/">launched in June 2012</a> and opened its platform for public consumption in December, Thusoo told me, and has processed about half a petabyte of customer data since then. Thus far, the platform&#8217;s biggest users have been in the advertising technology, e-commerce and application-development spaces. A common use case (and one <a href="http://www.qubole.com/blog/mediamath-qubole-customer-use-case-study-marketing">detailed in a blog post by Qubole customer MediaMath</a>) is to create pipelines that use Hadoop to process unstructured data before pushing it into relational databases such as MySQL, Vertica or Infobright for more-traditional business-intelligence applications.</p>
<div id="attachment_626654" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/04/gigaom_structure_data_2224.jpg"><img  alt="Structure Data 2013 Ashish Thusoo Quobole" src="http://gigaom2.files.wordpress.com/2013/04/gigaom_structure_data_2224.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-626654" /></a><p class="wp-caption-text">Ashish Thusoo at Structure: Data 2013, (c) Albert Chau, itsmebert.com</p></div>
<p>However, Thusoo added, Qubole also has connectors for getting data out of certain other data stores, such as MongoDB, and is working on letting customers import data via API from services such as Omniture and Google analytics.</p>
<p>Being in the cloud &#8212; especially Amazon&#8217;s cloud &#8212; could actually pay big dividends, too, and not just because it lets Qubole scale clusters automatically and lets users avoid the operational headaches of maintaining a Hadoop cluster. Companies are already using Amazon S3 to store a lot of data &#8212; <a href="http://gigaom.com/2013/04/18/amazon-s3-goes-exponential-now-stores-2-trillion-objects/">more than 2 trillion objects </a>at this point &#8212; and that&#8217;s Qubole&#8217;s choice for a storage system, as well. As companies move more of their big data workloads to the cloud, S3 serves as a cheap, easy and generic storage platform to which they can connect various services and applications.</p>
<p>In January, for example, Netflix <a href="http://gigaom.com/2013/01/10/netflix-shows-off-its-hadoop-architecture/">detailed its cloud-based Hadoop platform</a> that consists of numerous services but relies on Amazon S3 as the source-of-truth data store.</p>
<div id="attachment_601005" class="wp-caption aligncenter" style="width: 410px"><a href="http://gigaom2.files.wordpress.com/2013/01/hadoop-nflx.jpg"><img  alt="Netflix's Hadoop architecture." src="http://gigaom2.files.wordpress.com/2013/01/hadoop-nflx.jpg?w=708"   class="size-full wp-image-601005" /></a><p class="wp-caption-text">Netflix&#8217;s Hadoop architecture.</p></div>
<p>If there&#8217;s one big question about Qubole, though, it has to be <a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">the emergence of a rather-large SQL-on-Hadoop market</a> since the company launched. Although Hive has been an important part of the Hadoop stack over the past few years, its MapReduce foundation is beginning to show its age in terms of query speed, and the new breed of database startups pushing SQL analytics atop Hadoop <a href="http://drawntoscale.com/is-there-a-database-in-big-data-heaven-understanding-the-world-of-sql-on-hadoop/">are quick to point this out</a>.</p>
<p>Thusoo has certainly noticed this activity, but he stills sees Qubole as being in a good position. For starters, he said, the company is looking at interactive analytics projects such as <a href="http://gigaom.com/2012/10/24/cloudera-makes-sql-a-first-class-citizen-in-hadoop/">Impala</a> and <a href="http://gigaom.com/2013/04/17/welcome-to-berkeley-where-hadoop-isnt-nearly-fast-enough/">Shark</a> to see how they might integrate with the Qubole platform, and Hadoop startup Hortonworks is <a href="http://hortonworks.com/blog/100x-faster-hive/">leading the Stinger project</a> to drastically boost the speed of Hive itself.</p>
<p>Further, there&#8217;s the fact that Qubole itself has already <a href="http://www.qubole.com/blog/index.php/optimizing-hadoop-for-s3-part-1">optimized its platform</a> to run, on average, about five times faster than Hive would normally run on Amazon Elastic MapReduce alone.</p>
<p>&#8220;We&#8217;re also keeping a close tab on other projects in our space,&#8221; Thusoo said. &#8220;We have a lot of options &#8230; to play with.&#8221;</p>
<p><em>This story was updated at 8:32 a.m. to clarify that Qubole can handle MapReduce and Pig jobs as well as Hive, and that its seed round came in late 2011, not late 2012.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=633392&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=331873"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=331873" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633392+hadoop-startup-qubole-raises-7m-for-hive-as-a-service&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633392+hadoop-startup-qubole-raises-7m-for-hive-as-a-service&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633392+hadoop-startup-qubole-raises-7m-for-hive-as-a-service&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2010/12/9-companies-that-pushed-the-infrastructure-discussion-in-2010/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=633392+hadoop-startup-qubole-raises-7m-for-hive-as-a-service&utm_content=dharrisstructure">9 Companies that Pushed the Infrastructure Discussion in 2010</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/23/hadoop-startup-qubole-raises-7m-for-hive-as-a-service/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/gigaom_structure_data_2224.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/gigaom_structure_data_2224.jpg?w=150" medium="image">
			<media:title type="html">Structure Data 2013 Ashish Thusoo Quobole</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/gigaom_structure_data_2224.jpg?w=300" medium="image">
			<media:title type="html">Structure Data 2013 Ashish Thusoo Quobole</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/hadoop-nflx.jpg" medium="image">
			<media:title type="html">Netflix&#039;s Hadoop architecture.</media:title>
		</media:content>
	</item>
	</channel>
</rss>