<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; WibiData</title>
	<atom:link href="http://gigaom.com/tag/wibidata/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Sat, 25 May 2013 16:20:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; WibiData</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>WibiData gets $15M to help it become the Hadoop application company</title>
		<link>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/</link>
		<comments>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/#comments</comments>
		<pubDate>Thu, 23 May 2013 11:31:17 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[OPower]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=648663</guid>
		<description><![CDATA[Startup WibiData has raised another $15 million and wants to turn the lessons it has learned in the field into generic software that can let anyone build predictive applications on Hadoop.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648663&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.wibidata.com/">WibiData</a> &#8212; the big data startup from Cloudera Co-founder Christophe Bisciglia and Aaron Kimball &#8212; doesn&#8217;t have <em>overly</em> big plans. It only wants to become one of the first, if not the first, company selling off-the-shelf software that lets other companies build valuable, customer-facing applications on Hadoop. On Thursday, WibiData announced $15 million in Series B funding from Canaan Partners, as well as existing investors NEA and Google Chairman Eric Schmidt, to help make the goal a reality. </p>
<p>Kidding aside, that&#8217;s actually quite an ambitious goal in a Hadoop market that&#8217;s big and growing, but that&#8217;s exemplified by expensive consulting arrangements and purpose-built applications. Even more so for companies that want to do something other than transforming unstructured data into structured data (often called ETL) or run back-office analytics jobs. In fact, WibiData has spent the last 18 months doing just this type of deal, and Bisciglia says every single customer has already engaged with one of the big three Hadoop vendors (Cloudera, Hortonworks and MapR). </p>
<p>Home energy-management startup <a href="http://gigaom.com/2012/11/19/opower-the-big-data-energy-player-to-beat/">Opower</a> is a good example of this process. It&#8217;s actually one of Cloudera&#8217;s banner customers, but &#8220;when they wanted to take [their software-as-a-service tool] beyond batch analysis and ETL workloads,&#8221; Bisciglia said, Opower came to WibiData. So whereas the Opower service was originally focused on nightly data analysis comparing users&#8217; energy usage against that of other users, it&#8217;s now working on dynamic recommendations for users and letting them engage with the application in new ways.</p>
<div id="attachment_648685" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg"><img  alt="The WibiData architecture" src="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg?w=300&#038;h=224" width="300" height="224" class="size-medium wp-image-648685" /></a><p class="wp-caption-text">The WibiData architecture</p></div>
<p>During these engagements, WibiData <a href="http://gigaom.com/2012/03/22/wibidata-structure-data-2012/">has been building up its core technology</a> for connecting those brawny back-office Hadoop environments to predictive customer-facing applications &#8211; a collection of HBase, data-formatting tools and machine learning algorithms that the company <a href="http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/">has been slowly open-sourcing under the Kiji banner</a>. It has also been learning the similarities among the applications it&#8217;s building for customers in the same field, figuring out what&#8217;s repeatable. What does any given company in the retail space, for example, need to get started on <a href="http://gigaom.com/2013/05/08/why-3-celebrity-data-scientists-are-willing-to-work-for-free-for-you/">its own recommendation engine</a>? </p>
<p>And now, Bisciglia says, WibiData is going to double down on building application software based on what it has learned. The first two industries it targets will likely be financial services and retail, two areas where the company has seen a lot of traction. He envisions the finished product including some pre-defined schema for formatting data and some pre-built predictive models, both broadly applicable across that industry rather than specific to a single user. </p>
<p>There will also be different interfaces that allow different types of users (e.g., data scientists, systems engineers and business users) to interact with the data in the ways they need to. </p>
<p>Time will tell if WibiData can actually accomplish its goal of turning Hadoop into a collection of somewhat specialized software packages, but someone has to. Even industry heavyweights like Cloudera see the need, but their hands are full just getting Hadoop integrated into existing environments and getting those early uses up and running. As Cloudera CEO Mike Olson <a href="http://gigaom.com/2012/03/21/cloudera-structure-data-2012/">said at Structure: Data in 2012</a> to anyone ambitious enough to tackle the Hadoop-application gap, &#8220;Call me, I’ll connect you with funding. The money is out there.&#8221; </p>
<p>If you want to hear more about the need for Hadoop applications, check out this panel from Structure: Data 2013, where I speak with WibiData&#8217;s Omer Trajman, Continuuity&#8217;s Jonathan Gray and Pivotal&#8217;s Muddu Sudhakar. <span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/z7BhGEQX9BQ?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=648663&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=266977"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=266977" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/12/why-the-big-data-startup-boom-will-likely-be-short-lived/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=648663+wibidata-gets-15m-to-help-it-become-the-hadoop-application-company&utm_content=dharrisstructure">Why the big data startup boom will likely be short-lived</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/23/wibidata-gets-15m-to-help-it-become-the-hadoop-application-company/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/05/wibi-founders.png?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/05/wibi-founders.png?w=150" medium="image">
			<media:title type="html">wibi founders</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/05/wibi-kiji.jpg?w=300" medium="image">
			<media:title type="html">The WibiData architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>The history of Hadoop: From 4 nodes to the future of data</title>
		<link>http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/</link>
		<comments>http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/#comments</comments>
		<pubDate>Mon, 04 Mar 2013 13:00:43 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[VertiCloud]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=613362</guid>
		<description><![CDATA[In the first of our four-part multi-media series on Hadoop, the people who helped build Hadoop talk about its birth, its promise and the challenges in moving it from webscale to just large-scale.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613362&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Depending on how one defines its birth, <a href="http://hadoop.apache.org/">Hadoop</a> is now 10 years old. In that decade, Hadoop has gone from being the hopeful answer to Yahoo’s search-engine woes to a general-purpose computing platform that’s poised to be the foundation for the next generation of data-based applications.</p>
<p>Alone, Hadoop is a software market that IDC <a href="http://gigaom.com/2012/05/07/all-aboard-the-hadoop-money-train/">predicts will be worth $813 million</a> in 2016 (although that number is likely very low), but it’s also driving a big data market the research firm <a href="http://gigaom.com/2013/01/08/idc-says-big-data-will-be-24b-market-in-2016-i-say-its-bigger/">predicts will hit more than $23 billion</a> by 2016. Since Cloudera launched in 2008, Hadoop has spawned dozens of startups and <a href="http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/">spurred hundreds of millions in venture capital investment</a> since 2008.</p>
<p>In this four-part series, we’ll explain everything anyone concerned with information technology needs to know about Hadoop. Part I is the history of Hadoop from the people who willed it into existence and took it mainstream. Part II is more graphic; <a href="http://gigaom.com/2013/03/05/the-hadoop-ecosystem-the-welcome-elephant-in-the-room-infographic/">a map of the now-large and complex ecosystem</a> of companies selling Hadoop products. <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">Part III is a look into the future of Hadoop</a> that should serve as an opening salvo for much of the discussion <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&amp;utm_content=dharrisstructure">at our Structure: Data conference</a> March 20-21 in New York. Finally, <a href="http://gigaom.com/2013/03/08/hadoop-through-the-years-a-gigaom-retrospective/">part IV will highlight some the best Hadoop applications and seminal moments in Hadoop history</a>, as reported by GigaOM over the years.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972101%253Fsecret_token%253Ds-RbbVK"></iframe>
<h2 id="wanted-a-better-search-engine">Wanted: A better search engine</h2>
<p>Almost everywhere you go online now, Hadoop is there in some capacity. <a href="http://gigaom.com/2012/06/13/how-facebook-keeps-100-petabytes-of-hadoop-data-online/">Facebook</a>, <a href="http://gigaom.com/2012/01/31/under-the-covers-of-ebays-big-data-operation/">eBay</a>, <a href="http://gigaom.com/2011/11/02/how-etsy-handcrafted-a-big-data-strategy/">Etsy</a>, <a href="http://gigaom.com/2012/12/02/pinterest-flipboard-and-yelp-tell-how-to-save-big-bucks-in-the-cloud/">Yelp</a> , <a href="http://gigaom.com/2012/03/07/how-twitter-is-doing-its-part-to-democratize-big-data/">Twitter</a>, <a href="http://gigaom.com/2012/09/17/5-ideas-to-help-everyone-make-the-most-of-big-data/">Salesforce.com</a> — you name a popular web site or service, and the chances are it’s using Hadoop to analyze the mountains of data it’s generating about user behavior and even its own operations. Even in the physical world, forward-thinking companies in fields ranging from <a href="http://gigaom.com/2012/09/16/how-disney-built-a-big-data-platform-on-a-startup-budget/">entertainment</a> to <a href="http://gigaom.com/2012/10/11/the-rent-is-too-damn-high-but-big-data-means-the-power-bill-isnt/">energy management</a> to <a href="http://gigaom.com/2012/04/17/satellite-imagery-and-hadoop-mean-70m-for-skybox/">satellite imagery</a> are using Hadoop to analyze the unique types of data they’re collecting and generating.</p>
<p>Everyone involved with information technology at least knows what it is. Hadoop even serves as the foundation for new-school <a href="http://incubator.apache.org/giraph/">graph</a> and <a href="http://hbase.apache.org/">NoSQL databases</a>, as well as <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">bigger, badder versions of relational databases</a> that have been around for decades.</p>
<p>But it wasn’t always this way, and today’s uses are a long way off from the original vision of what Hadoop could be.</p>
<div id="attachment_616209" class="wp-caption alignleft" style="width: 210px"><img alt="Doug Cutting" src="http://gigaom2.files.wordpress.com/2013/03/cutting.jpg?w=708"   class="size-full wp-image-616209"><p class="wp-caption-text">Doug Cutting</p></div>
<p>When the seeds of Hadoop were first planted in 2002, the world just wanted a better open-source search engine. So then-Internet Archive search director Doug Cutting and University of Washington graduate student Mike Cafarella set out to build it. They called their project <a href="http://nutch.apache.org/">Nutch</a> and it was designed with that era’s web in mind.</p>
<p>Looking back on it today, early iterations of Nutch were kind of laughable. About a year into their work on it, Cutting and Cafarella thought things were going pretty well because Nutch was already able to crawl and index hundreds of millions of pages. “At the time, when we started, we were sort of thinking that a web search engine was around a billion pages,” Cutting explained to me, “so we were getting up there.”</p>
<p>There are now about 700 million web sites and, <a href="http://articles.cnn.com/2011-09-12/tech/web.index_1_internet-neurons-human-brain?_s=PM%3ATECH">according to Wired’s Kevin Kelly</a>, well over a trillion web pages.</p>
<p>But getting Nutch to work wasn’t easy. It could only run across a handful of machines, and someone had to watch it around the clock to make sure it didn’t fall down.</p>
<div id="attachment_616210" class="wp-caption alignright" style="width: 251px"><img alt="Mike Cafarella" src="http://gigaom2.files.wordpress.com/2013/03/cafarella241.jpg?w=708"   class="size-full wp-image-616210"><p class="wp-caption-text">Mike Cafarella</p></div>
<p>“I remember working on it for several months, being quite proud of what we had been doing, and then the Google File System paper came out and I realized ‘Oh, that’s a much better way of doing it. We should do it that way,’” reminisced Cafarella. “Then, by the time we had a first working version, the MapReduce paper came out and that seemed like a pretty good idea, too.”</p>
<p>Google released the <a href="http://research.google.com/archive/gfs.html">Google File System paper</a> in October 2003 and the <a href="http://research.google.com/archive/mapreduce.html">MapReduce paper</a> in December 2004. The latter would prove especially revelatory to the two engineers building Nutch.</p>
<p>“What they spent a lot of time doing was generalizing this into a framework that automated all these steps that we were doing manually,” Cutting explained.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972106%253Fsecret_token%253Ds-gmRg8"></iframe>
<p>Raymie Stata, founder and CEO of Hadoop startup <a href="http://verticloud.com/">VertiCloud</a> (and former Yahoo CTO), calls MapReduce “a fantastic kind of abstraction” over the distributed computing methods and algorithms most search companies were already using:</p>
<blockquote id="quote-everyone-had-somethi"><p>“Everyone had something that pretty much was like MapReduce because we were all solving the same problems. We were trying to handle literally billions of web pages on machines that are probably, if you go back and check, epsilon more powerful than today’s cell phones. … So there was no option but to latch hundreds to thousands of machines together to build the index. So it was out of desperation that MapReduce was invented.”</p></blockquote>
<div id="attachment_616201" class="wp-caption aligncenter" style="width: 718px"><img alt="MapReduce diagram, from the Google paper" src="http://gigaom2.files.wordpress.com/2013/03/index-auto-0008-0001.gif?w=708&#038;h=489" width="708" height="489" class="size-large wp-image-616201"><p class="wp-caption-text">Parallel processing in MapReduce, from the Google paper</p></div>
<p>Over the course of a few months, Cutting and Cafarella built up the underlying file systems and processing framework that would become Hadoop (in Java, notably, whereas Google’s MapReduce used C++) and ported Nutch on top of it. Now, instead of having one guy watch a handful of machines all day long, Cutting explained, they could just set it running on between 20 and 40 machines that he and Cafarella were able to scrape together from their employers.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972114%253Fsecret_token%253Ds-yCIvx"></iframe>
<h2 id="bringing-hadoop-to-life-but-no">Bringing Hadoop to life (but not in search)</h2>
<p>Anyone vaguely familiar with the history of Hadoop can guess what happens next: In 2006, Cutting went to work with Yahoo, which was equally impressed by the Google File System and MapReduce papers and wanted to build open source technologies based on them. They spun out the storage and processing parts of Nutch to form Hadoop (named after Cutting’s son’s stuffed elephant) as an open-source Apache Software Foundation project and the Nutch web crawler remained its own separate project.</p>
<p>“This seem like a perfect fit because I was looking for more people to work on it, and people who had thousands of computers to run it on,” Cutting said.</p>
<p>Cafarella, now <a href="http://web.eecs.umich.edu/~michjc/bio.html">an associate professor at the University of Michigan</a>, opted to forgo a career in corporate IT and focus on his education. He’s happy as a professor — and currently working on a Hadoop-complementary project called <a href="http://cloudera.github.com/RecordBreaker/">RecordBreaker</a> — but, he joked, “My dad calls me the Pete Best of the big data world.”</p>
<p>Ironically, though, the 2006-era Hadoop was nowhere near ready to handle production search workloads at webscale — the very task it was created to do. “The thing you gotta remember,” explained Hortonworks Co-founder and CEO Eric Baldeschwieler (who was previously VP of Hadoop software development at Yahoo), “is at the time we started adopting it, the aspiration was definitely to rebuild Yahoo’s web search infrastructure, but Hadoop only really worked on 5 to 20 nodes at that point, and it wasn’t very performant, either.”</p>
<div id="attachment_616234" class="wp-caption aligncenter" style="width: 718px"><a href="http://www.flickr.com/photos/yodelanecdotal/4746014041/sizes/l/in/photostream/"><img alt="Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal" src="http://gigaom2.files.wordpress.com/2013/03/4746014041_7a80b97c2e_b.jpg?w=708&#038;h=472" width="708" height="472" class="size-large wp-image-616234"></a><p class="wp-caption-text">Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal</p></div>
<p>Stata recalls a “slow march” of horizontal scalability, growing Hadoop’s capabilities from the single digits of nodes into the tens of nodes and ultimately into the thousands. “It was just an ongoing slog … every factor of 2 or 1.5 even was serious engineering work,” he said. But Yahoo was determined to scale Hadoop as far as it needed to go, and it continued investing heavy resources into the project.</p>
<p>It actually took years for Yahoo to moves its web index onto Hadoop, but in the meantime the company made what would be a fortuitous decision to set up what it called a “research grid” for the company’s data scientists, to use today’s parlance. It started with dozens of nodes and ultimately grew to hundreds as they added more and more data and Hadoop’s technology matured. What began life as a proof of concept fast became a whole lot more.</p>
<p>“This very quickly kind of exploded and became our core mission,” Baldeschwieler said, “because what happened is the data scientists not only got interesting research results — what we had anticipated — but they also prototyped new applications and demonstrated that those applications could substantially improve Yahoo’s search relevance or Yahoo’s advertising revenue.”</p>
<p>Shortly thereafter, Yahoo began rolling out Hadoop to power analytics for various production applications. Eventually, Stata explained, Hadoop had proven so effective that Yahoo merged its search and advertising into one unit so that Yahoo’s bread-and-butter sponsored search business could benefit from the new technology.</p>
<div id="attachment_616207" class="wp-caption aligncenter" style="width: 718px"><a href="http://www.flickr.com/photos/joeywan/2467450286/"><img alt="Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM's Hadoop Meetup in 2008." src="http://gigaom2.files.wordpress.com/2013/03/2467450286_db547ef9ef_b.jpg?w=708&#038;h=365" width="708" height="365" class="size-large wp-image-616207"></a><p class="wp-caption-text">Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM’s Hadoop Meetup in 2008.</p></div>
<p>And <a href="http://gigaom.com/2010/06/29/yahoo-secures-and-tames-hadoop-with-new-tools/">that’s exactly what happened</a>, because although data scientists didn’t need things like service-level agreements, business leaders did. So, Stata said, Yahoo implemented some scheduling changes within Hadoop. And although data scientists didn’t need security, Securities and Exchange Commission requirements mandated a certain level of security when Yahoo moved its sponsored search data onto it.</p>
<p>“That drove a certain level of maturity,” Stata said. “… We ran all the money in Yahoo through it, eventually.”</p>
<p>The transformation into Hadoop being “behind every click” (or every batch process, technically) at Yahoo was pretty much complete by 2008, Baldeschwieler said. That meant doing everything from these line-of-business applications to spam filtering to personalized display decisions on the Yahoo front page. By the time Yahoo spun out Hortonworks into a separate, Hadoop-focused software company in 2011, Yahoo’s Hadoop infrastructure consisted of 42,000 nodes and hundreds of petabytes of storage.</p>
<iframe width="100%" height="166" scrolling="no" frameborder="no" src="http://w.soundcloud.com/player?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F80972099%253Fsecret_token%253Ds-g7Wo5"></iframe>
<h2 id="from-the-classroom">From the classroom …</h2>
<p>However, although Yahoo was responsible for the vast majority of development during its formative years, Hadoop didn’t exist in a bubble inside Yahoo’s headquarters. It was a full-on Apache project that attracted users and contributors from around the world. Guys like Tom White, a Welshman who actually wrote O’Reilly Media’s book <i>Hadoop: The Definitive Guide</i> despite being what Cutting describes as a guy who just liked software and played with Hadoop at night.</p>
<p>Up in Seattle in 2006, a young Google engineer named Christophe Bisciglia was using his 20 percent time to teach a computer science course at the University of Washington. Google wanted to hire new employees with experience working on webscale data, but its MapReduce code was proprietary, so it bought a rack of servers and used Hadoop as a proxy.</p>
<p><a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/2/">Go to page 2 (of 2) on GigaOM .</a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=613362&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=271787"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=271787" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/11/real-%c2%adtime-query-for-hadoop-democratizes-access-to-big-data-analytics/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=613362+the-history-of-hadoop-from-4-nodes-to-the-future-of-data&utm_content=dharrisstructure">Real-­time query for Hadoop democratizes access to big data analytics</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/gigaom-hadoop-icon-final.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/gigaom-hadoop-icon-final.jpg?w=150" medium="image">
			<media:title type="html">gigaom hadoop icon final</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/cutting.jpg" medium="image">
			<media:title type="html">Doug Cutting</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/cafarella241.jpg" medium="image">
			<media:title type="html">Mike Cafarella</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/index-auto-0008-0001.gif?w=708" medium="image">
			<media:title type="html">MapReduce diagram, from the Google paper</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/4746014041_7a80b97c2e_b.jpg?w=708" medium="image">
			<media:title type="html">Baldeschwieler at Hadoop Summit 2010. Source: Yodel Anectdotal</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/2467450286_db547ef9ef_b.jpg?w=708" medium="image">
			<media:title type="html">Cutting (center) flanked by Baldeschwieler and Om Malik at GigaOM&#039;s Hadoop Meetup in 2008.</media:title>
		</media:content>
	</item>
		<item>
		<title>WibiData open sources Kiji to make HBase easier</title>
		<link>http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/</link>
		<comments>http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/#comments</comments>
		<pubDate>Wed, 14 Nov 2012 15:42:53 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[application development]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=584564</guid>
		<description><![CDATA[HBase is a great option for developing big data applications, but it's not necessarily easy to use. WibiData is addressing this by open sourcing a portion of its predictive analytics infrastructure that adds structure to data, followed eventually by a whole HBase development framework called Kiji.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=584564&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.wibidata.com/">WibiData</a>, the Hadoop-based <a href="http://gigaom.com/cloud/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/">user analytics startup from Cloudera co-founder Christophe Bisciglia</a>, has open sourced part of its software stack that&#8217;s designed to make it easier for developers build big data apps on the HBase NoSQL database. Called <a href="http://www.kiji.org/">KijiSchema</a>, the technology is a Java API for adding schema to data flowing into HBase so that applications needing to analyze the data can actually know something about it.</p>
<p>As WibiData product manager Devjit Chakravarti told me during a recent call, KijiSchema essentially &#8220;takes the &#8216;No&#8217; out of NoSQL.&#8221; What he means is that although NoSQL databases such as HBase are lauded in part because they can store unstructured data and don&#8217;t require rigid rules for data formatting like relational databases do, having some structure is actually necessary once you want to do meaningful analysis on it. That&#8217;s why some commercial products, such as <a href="http://gigaom.com/cloud/how-one-startup-wants-to-inject-hadoop-into-your-sql/">Drawn to Scale&#8217;s Spire</a> and <a href="http://gigaom.com/data/batten-down-the-analysts-its-a-big-data-bi-storm/">Splice Machine&#8217;s Splice SQL Engine</a>, already have built functional SQL databases on top of HBase.</p>
<div id="attachment_584629" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/11/kimball.jpg"><img  title="kimball" alt="" src="http://gigaom2.files.wordpress.com/2012/11/kimball.jpg?w=708"   class="size-full wp-image-584629" /></a><p class="wp-caption-text">Kimball speaking at Structure: Data in 2012<br />(c) 2012 Pinar Ozger. pinar@pinarozger.com</p></div>
<p>&#8220;If you can&#8217;t store data in an organized way, you can&#8217;t analyze it effectively,&#8221; WibiData Co-Founder and CTO Aaron Kimball explained. KijiSchema isn&#8217;t part of WibiData&#8217;s secret sauce around predictive analytics for user data, he added, but nothing gets done without it.</p>
<p>Here&#8217;s how Kimball describes how KijiSchema manages data <a href="http://www.wibidata.com/2012/11/14/the-kiji-project-an-open-source-framework-for-building-big-data-applications-with-apache-hbase/">in a blog post announcing the project</a>:</p>
<blockquote><p>&#8220;KijiSchema gives developers the ability to easily store both structured and unstructured data within HBase using Avro serialization. It supports a variety of rich schema features, including complex, compound data types, HBase column key and time-series indexing, as well cell-level evolving schemas that dynamically encode version information.</p>
<p>&#8220;KijiSchema promotes the use of entity-centric data modeling, where all information about a given entity (user, mobile device, ad, product, etc.), including dimensional and transaction data, is encoded within the same row. This approach is particularly valuable for user-based analytics such as targeting, recommendations, and personalization.&#8221;</p></blockquote>
<div id="attachment_584626" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/11/wibi-kiji.jpg"><img  title="wibi kiji" alt="" src="http://gigaom2.files.wordpress.com/2012/11/wibi-kiji.jpg?w=300&#038;h=224" height="224" width="300" class="size-medium wp-image-584626" /></a><p class="wp-caption-text">Kiji resides in the lower left section</p></div>
<p>The coolest part for HBase developers or prospective HBase developers, however, might be that KijiSchema isn&#8217;t just code but is already pre-packaged any ready to deploy. WibiData has created what it calls the Kiji BentoBox &#8212; &#8220;a fully-functional HBase mini-cluster with KijiSchema on your machine with minimal configuration in under 15 minutes&#8221; &#8212; that&#8217;s <a href="http://www.kiji.org/getstarted/#Downloads">available for download on Github</a>.</p>
<p>KijiSchema is also part of a broader Kiji framework for HBase that WibiData plans to open source over the next year or so. People perceive HBase as being complicated to set up and having a steep learning curve, Kimball said, and his teams wants to make it more accessible and lower the barrier for getting started. The ultimate goal is to make the types of HBase applications <a href="http://gigaom.com/cloud/how-facebook-is-powering-real-time-analytics/">that folks at Facebook</a>, <a href="http://gigaom.com/cloud/under-the-covers-of-ebays-big-data-operation/">eBay</a> and other large web shops are building something that any developer can do.</p>
<p>WibiData&#8217;s Omer Trajman, formerly VP of technology solutions at Cloudera, describes the ultimate Kiji framework as being akin what the <a href="http://www.springsource.org/spring-framework">Spring framework</a> if for Java. Despite its complexity, &#8220;there are also tens of thousands of developers who have been able to figure [HBase] out,&#8221; he said, but learning it might take weeks of intensive training on learning the low-level guts of the Hadoop Distributed File System and other stuff. Why learn to build an enterprise Java application from scratch, Trajman asked, when you can just use Spring?</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=584564&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=131935"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=131935" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=584564+wibidata-open-sources-kiji-to-make-hbase-more-useful&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=584564+wibidata-open-sources-kiji-to-make-hbase-more-useful&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=584564+wibidata-open-sources-kiji-to-make-hbase-more-useful&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/report/cloud-and-data-first-quarter-2013-analysis-and-outlook/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=584564+wibidata-open-sources-kiji-to-make-hbase-more-useful&utm_content=dharrisstructure">Cloud and data first-quarter 2013: analysis and outlook</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/wibi-kiji.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/wibi-kiji.jpg?w=150" medium="image">
			<media:title type="html">wibi kiji</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/kimball.jpg" medium="image">
			<media:title type="html">kimball</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/wibi-kiji.jpg?w=300" medium="image">
			<media:title type="html">wibi kiji</media:title>
		</media:content>
	</item>
		<item>
		<title>A few stats, rumors and stories on Hadoop&#8217;s rapid growth</title>
		<link>http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/</link>
		<comments>http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/#comments</comments>
		<pubDate>Fri, 09 Nov 2012 23:32:24 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=582462</guid>
		<description><![CDATA[The largest players in the Hadoop market are already raising money and sky-high valuations, employing hundreds of people and, in some cases, looking at nine-figure revenues. If you're trying to get a sense of whether Hadoop is for real, these details might help.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582462&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s little hard data on the size of the largely private Hadoop market yet, but you can get a clue from looking at what&#8217;s going on inside Silicon Valley. The money changing hands and the sizes of the largest players in the space alone are enough to paint a telling picture of a market that&#8217;s growing fast in uncharted territory. I&#8217;ve collected some of the insights I&#8217;ve gleaned over the past few months to try and add some perspective.</p>
<p>Everything, of course, is relative and we might never see a Hadoop vendor reach the size of a database company such as Oracle with more than 100,000 employees and tens of billions in annual revenue. After all, Hadoop is a new technology for most companies, so it&#8217;s not really moving in on an already lucrative market and stealing budgetary dollars from incumbents. Further &#8212; and possibly more importantly &#8212; the core Hadoop technology is free and open source, meaning there are lots of unpaid downloads so money comes from services, support and large enterprises willing to buy software licenses for value-added products.</p>
<h2>Money</h2>
<p>Here&#8217;s a chart showing how much money Hadoop-based companies have raised thus far (although the grand total will likely rise by at least $10 million next week). Keep in mind, Cloudera only launched in 2009 and Hortonworks launched in June 2011. And these aren&#8217;t companies that merely bury Hadoop under an application or can connect their technologies to it &#8212; these are companies either selling Hadoop or applications designed specifically for it.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/hadoop-funding.jpg"><img  title="hadoop funding" alt="" src="http://gigaom2.files.wordpress.com/2012/11/hadoop-funding.jpg?w=708"   class="aligncenter size-full wp-image-583319" /></a><br />
(To view the original, interactive chart, <a href="http://public.tableausoftware.com/views/Hadoopfunding2/Sheet1?:embed=y">click here</a>.)</p>
<p>In terms of revenue, one might look to a May 2012 report by research from IDC <a href="http://gigaom.com/cloud/all-aboard-the-hadoop-money-train/">estimating the size of the Hadoop ecosystem to be around $77 million</a>, growing to $813 million by 2016. Those are both impressive numbers, but they might actually be short-changing reality. For one, as I noted at the time, the authors attributed almost no revenue to Amazon Web Services&#8217; Elastic MapReduce service, which is almost certainly generating at least a few million in revenue each year.</p>
<p>Speaking to me in June, Cloudera CEO Mike Olson also took issue with the number, claiming it didn&#8217;t even take Cloudera&#8217;s revenue into account &#8212; which seems entirely possible considering the business Cloudera is doing. I&#8217;ve heard from reliable sources that Cloudera is doing very well and is on track to do about $100 million in revenue this year, very possibly more. And as early as April 2011, Cloudera executives were <a href="http://gigaom.com/cloud/why-cloudera-isnt-sweating-the-hadoop-competition/">touting that software license revenue had already surpassed services revenue</a> (although it&#8217;s arguable whether that will, or even has to, remain the case).</p>
<p>More anecdotally, I&#8217;ve heard from several sources that Hortonworks has already declined at least one potentially appealing acquisition offer. That it wouldn&#8217;t sell isn&#8217;t surprising: sources say the company is valued at $225 million after its last round of funding and is looking to raise more money. And although it <a href="http://hortonworks.com/blog/announcing-general-availability-of-hortonworks-data-platform/">just released its first product in June</a>, the company has impressive and potentially lucrative partnerships in place with <a href="http://gigaom.com/cloud/microsofts-hadoop-play-is-shaping-up-and-it-includes-excel/">Microsoft</a>, <a href="http://gigaom.com/cloud/teradata-taps-hortonworks-to-improve-hadoop-story/">Teradata</a>, <a href="http://gigaom.com/cloud/rackspace-versus-amazon-the-big-data-edition/">Rackspace</a>, <a href="http://gigaom.com/cloud/hortonworks-teams-with-vmware-to-keep-hadoop-running/">VMware</a> and other large vendors.</p>
<p>MapR, the proprietary thorn in the sides of both Cloudera and Hortonworks, appears to be doing quite well, too. Vice President of Marketing Jack Norris <a href="http://gigaom.com/cloud/the-state-of-hadoop-strong-and-poised-to-explode/">told me in June that his company had higher license revenue than many would expect</a> and predicted that <a href="http://gigaom.com/cloud/amazon-taps-mapr-for-high-powered-elastic-mapreduce/">deals with Amazon Web Services</a> and Google Compute Engine would help the company become &#8220;the license revenue leader within the next quarter.&#8221;</p>
<p>Former Cloudera VP of Technology Solutions Omer Trajan, who just left to join HBase-centric startup WibiData, shared some insights with me from his days at Cloudera that seem to back up vendor confidence. He said most mature production clusters (excluding monster users such as Facebook) consist of about 200 nodes, and many double in size after the first year. That&#8217;s part of the reason Cloudera grew in size about 10x during the three years he was there.</p>
<p>&#8220;It has definitely been a rocket ship,&#8221; he said. &#8220;&#8230; You just strap in and hope you make it up.&#8221;</p>
<p>Interest is only picking up, too: &#8220;There are more people that have started big data projects in the past six months than have big big data projects running [in production],&#8221; Trajman said.</p>
<h2>People</h2>
<p>It&#8217;s probably not accurate to call companies such as Cloudera, Hortonworks and MapR startups anymore, and we might start to see signs of this shift in personnel moves. Here&#8217;s how big they are and expect to become:</p>
<ul>
<li><strong>Cloudera: </strong>More than 300 employees globally and growing, especially in the sales department.</li>
<li><strong>Hortonworks: </strong>145 employees as of late October and hiring a person per day, on average, through the end of 2012.</li>
<li><strong>MapR: </strong>More than 125 employees, mostly in technical and engineering positions; starting to build sales team and looks to more than double headcount in 2013.</li>
</ul>
<p>While Cloudera and Hortonworks, for example, are still young, nimble and agile enough <a href="http://gigaom.com/cloud/is-vmwares-brain-drain-a-sign-of-its-influence-or-of-its-demise/">to lure a fair amount of talent</a> from now-officially large enterprises such as VMware, their employees who joined on early and really love the startup life might not stick around.</p>
<p>Trajman&#8217;s new home, WibiData, is a fine example of this. It was launched last year by former Cloudera employees Christophe Bisciglia (who actually co-founded Cloudera) and Aaron Kimball <a href="http://gigaom.com/cloud/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/">to help companies build behavioral-analysis applications on top of Hadoop</a>.</p>
<p>(Maybe there&#8217;s a Cloudera mafia shaping up: WibiData&#8217;s officemates &#8212; <a href="http://gigaom.com/cloud/how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity/">MemCachier</a> and <a href="http://thanx.com/">Thanx</a> &#8212; both count former Cloudera employees as key members or founders of their teams, <a href="http://drawntoscale.com/about-us/">as does HBase-centric startup Drawn to Scale</a>.)</p>
<p>Trajman, who was one of the first couple dozen employees at Cloudera (and who previously joined Vertica at around the same stage in its growth) told me he likes the rush of getting in the the ground level of new technologies and helping companies do something really new. While he enjoyed establishing and implementing some the the core foundational use cases for Hadoop (e.g., ETL and data exploration) with Cloudera&#8217;s early customers, that&#8217;s still much of what Cloudera provides to customers because it&#8217;s so difficult to build higher-level and higher-value applications at the infrastructural level where Cloudera operates.</p>
<p>&#8220;For me, it was very personal in terms of the impact I wanted to have,&#8221; Trajman said. At WibiData, he can help users who have the infrastructure part resolved and now want to develop applications that make data analysis a core part of their businesses. Where there&#8217;s a focus on innovation, he said, that&#8217;s where the innovators go.</p>
<p>This isn&#8217;t a bad thing, it&#8217;s just a side effect of growth &#8212; and when employees stay and innovate in the Hadoop space, it just creates a bigger pie for everyone to share.</p>
<p>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-478987p1.html">Shutterstock user GuskovaNatalia</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582462&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=581635"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=581635" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582462+a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/09/a-few-stats-rumors-and-stories-on-on-hadoops-rapid-growth/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_95592730.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_95592730.jpg?w=150" medium="image">
			<media:title type="html">Tall buildings</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/hadoop-funding.jpg" medium="image">
			<media:title type="html">hadoop funding</media:title>
		</media:content>
	</item>
		<item>
		<title>Takeaways from the second quarter in cloud and data</title>
		<link>http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/</link>
		<comments>http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/#comments</comments>
		<pubDate>Tue, 17 Jul 2012 15:55:38 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/jomaitland/" rel="author">Jo Maitland</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Adara Networks]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[Battery Ventures]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Birst]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Cetas Software]]></category>
		<category><![CDATA[Cirro]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[converged infrastructure]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[Demand Media]]></category>
		<category><![CDATA[DynamicOps]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[eCircle]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Financial Times]]></category>
		<category><![CDATA[Flash storage]]></category>
		<category><![CDATA[GoGrid]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google BigQuery]]></category>
		<category><![CDATA[google buzz]]></category>
		<category><![CDATA[google compute engine]]></category>
		<category><![CDATA[google notebook]]></category>
		<category><![CDATA[google wave]]></category>
		<category><![CDATA[Google Web Accelerator]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop Summit]]></category>
		<category><![CDATA[Haoop]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[hypervisor]]></category>
		<category><![CDATA[I/O optimization]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[IDC]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Khosla Ventures]]></category>
		<category><![CDATA[LineRate Systems]]></category>
		<category><![CDATA[M&A]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[Metamarkets]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[microsoft-windows]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Open Networking Research Center]]></category>
		<category><![CDATA[Open Networking Summit]]></category>
		<category><![CDATA[OpenFlow]]></category>
		<category><![CDATA[Opera Solutions]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[PureSystems]]></category>
		<category><![CDATA[quest-software]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[redgiant-analytics]]></category>
		<category><![CDATA[RedGiantAnalytics]]></category>
		<category><![CDATA[SDN]]></category>
		<category><![CDATA[Serengeti]]></category>
		<category><![CDATA[SingleHop]]></category>
		<category><![CDATA[SoftLayer]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[software defined networking]]></category>
		<category><![CDATA[software defined networks]]></category>
		<category><![CDATA[solid state disk]]></category>
		<category><![CDATA[Tealeaf Technology]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[Terascala]]></category>
		<category><![CDATA[Terradata]]></category>
		<category><![CDATA[tier-3]]></category>
		<category><![CDATA[Truviso]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[Varicent Software]]></category>
		<category><![CDATA[VCE Company]]></category>
		<category><![CDATA[Verizon]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[virtustream]]></category>
		<category><![CDATA[vivisimo]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[Windows Azure]]></category>
		<category><![CDATA[XtremeIO]]></category>
		<category><![CDATA[XtremIO]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=116565</guid>
		<description><![CDATA[In cloud and big data, the second quarter of 2012 featured several high-profile deals and product launches that could reshape the marketplace for everyone. Google and Microsoft launched Infrastructure-as-a-Service offerings, software-defined networking took off, and all eyes stayed fixed on the continuing promise of data analytics.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=543550&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In cloud and big data, the second quarter of 2012 featured several high-profile deals and product launches that could reshape the marketplace for everyone. Google and Microsoft launched Infrastructure-as-a-Service offerings, software-defined networking took off, and all eyes stayed fixed on the continuing promise of data analytics. This quarterly wrap-up discusses these milestones, and provides a near-term outlook for trends, technologies and companies to watch in the next 18 to 24 months.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=543550&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=553717"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=553717" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Cloud computing infrastructure: 2012 and beyond</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=543550+cloud-and-data-second-quarter-2012-analysis-and-outlook-2&utm_content=gigaedit">Infrastructure Q1: Cloud and big data woo enterprises</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" medium="image">
			<media:title type="html">gigaompromasterimagecloud</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>Why service providers matter for the future of big data</title>
		<link>http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/</link>
		<comments>http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/#comments</comments>
		<pubDate>Thu, 22 Mar 2012 06:55:34 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/derrickharris/" rel="author">Derrick Harris</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[33across]]></category>
		<category><![CDATA[algorithm-specialists]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[analytics-as-a-service]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[big-data-outsourcing]]></category>
		<category><![CDATA[big-data-service-providers]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[data scientists]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[impetus]]></category>
		<category><![CDATA[infochimps]]></category>
		<category><![CDATA[logicworks]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[metascale]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mu-sigma]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[nuevora]]></category>
		<category><![CDATA[Opera Solutions]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parse.ly]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[profitero]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[redgiant-analytics]]></category>
		<category><![CDATA[scale-unlimited]]></category>
		<category><![CDATA[sears]]></category>
		<category><![CDATA[SGI]]></category>
		<category><![CDATA[Skytree]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[Sourcefire]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[think-big-analytics]]></category>
		<category><![CDATA[web analytics]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=102032</guid>
		<description><![CDATA[One solution to the big data skills shortage has been consulting firms that specialize in deploying big data systems companies need to make sense of their information. These companies will continue to play a vital role in helping us make sense of the the data deluge.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=502479&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One major solution to the big data skills shortage has been the emergence of consulting and outsourcing firms specializing in deploying big data systems that companies need in order to actually derive value from their information. These companies will continue to play a vital role in helping the greater corporate world make sense of the mountains of data they are collecting. However, if the current wave of democratizing big data lives up to its ultimate potential, today’s consultants and outsourcers will have to find a way to keep a few steps ahead of the game in order to remain relevant.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=502479&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=319749"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=319749" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=502479+why-service-providers-matter-for-the-future-of-big-data&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=502479+why-service-providers-matter-for-the-future-of-big-data&utm_content=gigaedit">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=502479+why-service-providers-matter-for-the-future-of-big-data&utm_content=gigaedit">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=502479+why-service-providers-matter-for-the-future-of-big-data&utm_content=gigaedit">Infrastructure Q2: Big data and PaaS gain more momentum</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>A near-term outlook for big data</title>
		<link>http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/</link>
		<comments>http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 06:55:20 +0000</pubDate>
		<dc:creator>Krish</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[33across]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[AOL]]></category>
		<category><![CDATA[Apache Foundation]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[apixio]]></category>
		<category><![CDATA[AppFog]]></category>
		<category><![CDATA[AstraZeneca]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[big-data-outsourcing]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[Blue Button]]></category>
		<category><![CDATA[Bristol-Myers Squibb]]></category>
		<category><![CDATA[BYD]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CBRE Group]]></category>
		<category><![CDATA[cdata-quality]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudant]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Comcast]]></category>
		<category><![CDATA[connected devices]]></category>
		<category><![CDATA[Consert]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[data processing]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data scientists]]></category>
		<category><![CDATA[data storage]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[data-as-a-service]]></category>
		<category><![CDATA[data-governance]]></category>
		<category><![CDATA[data-markets]]></category>
		<category><![CDATA[data-obesity]]></category>
		<category><![CDATA[data-quality]]></category>
		<category><![CDATA[data-quality-dimensions]]></category>
		<category><![CDATA[data-security]]></category>
		<category><![CDATA[DataFlux]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[DuPont]]></category>
		<category><![CDATA[E-ZPass]]></category>
		<category><![CDATA[EcoFactor]]></category>
		<category><![CDATA[Ecologic Analytics]]></category>
		<category><![CDATA[Electronic Medical Records]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[eMeter]]></category>
		<category><![CDATA[emrs]]></category>
		<category><![CDATA[ENBALA Power Networks]]></category>
		<category><![CDATA[energy-internet]]></category>
		<category><![CDATA[Enterprise Mobility]]></category>
		<category><![CDATA[enterprise-control-language]]></category>
		<category><![CDATA[enterprises]]></category>
		<category><![CDATA[Explorys]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Forbes]]></category>
		<category><![CDATA[Geisinger Health Systems]]></category>
		<category><![CDATA[ginger-io]]></category>
		<category><![CDATA[Global Pulse]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[grid storage]]></category>
		<category><![CDATA[GridMobility]]></category>
		<category><![CDATA[GroundedPower]]></category>
		<category><![CDATA[Group Health Cooperative]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hadoop-stack]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[health care]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Honeywell]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[hpcc]]></category>
		<category><![CDATA[Humedica]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[ibm-watson]]></category>
		<category><![CDATA[IDC]]></category>
		<category><![CDATA[impetus]]></category>
		<category><![CDATA[infochimps]]></category>
		<category><![CDATA[informatica]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[intelligent-applications]]></category>
		<category><![CDATA[Intermountain Healthcare]]></category>
		<category><![CDATA[jeopardy]]></category>
		<category><![CDATA[kaiser-permanente]]></category>
		<category><![CDATA[Landis+Gyr]]></category>
		<category><![CDATA[lexisnexis]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[logicworks]]></category>
		<category><![CDATA[M2M]]></category>
		<category><![CDATA[machine-to-machine]]></category>
		<category><![CDATA[MapR Technologies]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[mayo clinic]]></category>
		<category><![CDATA[McKinsey]]></category>
		<category><![CDATA[metascale]]></category>
		<category><![CDATA[meter-data-management-systems]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mobile carriers]]></category>
		<category><![CDATA[mobile health]]></category>
		<category><![CDATA[mu-sigma]]></category>
		<category><![CDATA[National Cancer Institute]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Nuance Communications]]></category>
		<category><![CDATA[nuevora]]></category>
		<category><![CDATA[oozie]]></category>
		<category><![CDATA[Opera Solutions]]></category>
		<category><![CDATA[OPower]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parse.ly]]></category>
		<category><![CDATA[patientslikeme]]></category>
		<category><![CDATA[Pervasive]]></category>
		<category><![CDATA[Pfizer]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[private clouds]]></category>
		<category><![CDATA[profitero]]></category>
		<category><![CDATA[Public Clouds]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[Recurve]]></category>
		<category><![CDATA[Red Hat]]></category>
		<category><![CDATA[redgiant-analytics]]></category>
		<category><![CDATA[Regulated Industries]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[scale-unlimited]]></category>
		<category><![CDATA[scienergy]]></category>
		<category><![CDATA[Sears Holding Corporation]]></category>
		<category><![CDATA[service providers]]></category>
		<category><![CDATA[SGI]]></category>
		<category><![CDATA[Siemens]]></category>
		<category><![CDATA[Silver Spring Networks]]></category>
		<category><![CDATA[Skytree]]></category>
		<category><![CDATA[Smart Grid]]></category>
		<category><![CDATA[smart meters]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[Sourcefire]]></category>
		<category><![CDATA[Sprint]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[structured data]]></category>
		<category><![CDATA[Systemcon]]></category>
		<category><![CDATA[T-Mobile]]></category>
		<category><![CDATA[talend]]></category>
		<category><![CDATA[Target]]></category>
		<category><![CDATA[targeted-advertising]]></category>
		<category><![CDATA[Tendril]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[The Internet of Things]]></category>
		<category><![CDATA[think-big-analytics]]></category>
		<category><![CDATA[Toshiba]]></category>
		<category><![CDATA[Trillium]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[unstructured data]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[Verizon Wireless]]></category>
		<category><![CDATA[VoltDB]]></category>
		<category><![CDATA[Wal-Mart]]></category>
		<category><![CDATA[WellPoint]]></category>
		<category><![CDATA[whirlpool]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[yelp]]></category>
		<category><![CDATA[zettaset]]></category>
		<category><![CDATA[zookeeper]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=101786</guid>
		<description><![CDATA[Big data now touches everything from enterprises to smart-meter startups, while Hadoop is fast becoming the leading tool to analyze that data, and debates around privacy abound. GigaOM Pro analysts offer insights on what to consider when it comes to big data decisions for your business.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=501896&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Big data now touches everything from enterprises and hospitals to smart-meter startups and connected devices in the home. Hadoop, meanwhile, is fast becoming the leading tool to analyze that data, and there is the ever-lingering question of privacy and how we, the technology industry, are responsible for teaching ethical ways to collect and regulate our data. This report, composed of eight different sections each written by a GigaOM Pro analyst, offers insights on what to consider when it comes to big data decisions for your business.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=501896&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=939582"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=939582" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">Infrastructure Q2: Big data and PaaS gain more momentum</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/03/datacenter.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/03/datacenter.jpg?w=150" medium="image">
			<media:title type="html">datacenter</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/04f327f032df043846baa7474b8e6aff?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">Krish</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop startup WibiData raises $5M to power web analytics</title>
		<link>http://gigaom.com/2012/02/07/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/</link>
		<comments>http://gigaom.com/2012/02/07/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 18:59:39 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[Online transaction processing]]></category>
		<category><![CDATA[web analytics]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=481558</guid>
		<description><![CDATA[WibiData, a Hadoop-based startup focused on making it easier to analyze user behavior, has raised $5 million from New Enterprise Associates. The company, formerly known as Odiago, launched in late 2011 already claiming Wikipedia and Atlassian among its early customers.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=481558&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>WibiData, a Hadoop-based startup focused on making it easier to analyze user behavior, has raised $5 million from New Enterprise Associates. The company, formerly known as Odiago, <a href="http://gigaom.com/cloud/below-the-surface-of-cloudera-founders-new-project/">launched in late 2011</a> already claiming Wikipedia and Atlassian among its early customers.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/02/env-ecosystem-web3.jpg"><img title="env-ecosystem-web" src="http://gigaom2.files.wordpress.com/2012/02/env-ecosystem-web3.jpg?w=300&#038;h=162" alt="" width="300" height="162" class="alignleft size-medium wp-image-481704"></a>Details about how, exactly, WibiData goes about letting users do web analytics have been sparse, but co-founder Aaron Kimball, who will present at our <a href="http://event.gigaom.com/structuredata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=481558+hadoop-startup-wibidata-raises-5m-to-power-web-analytics&amp;utm_content=dharrisstructure">Structure: Data conference</a> next month, <a href="http://www.wibidata.com/2012/02/07/how-wibidata-works/">explained some of it in a blog post</a> on Monday. The post is fairly technical, but the gist is that WibiData leverages <a href="http://hadoop.apache.org/">Apache Hadoop</a>, <a href="http://hbase.apache.org/">HBase</a> and <a href="http://avro.apache.org/">Avro</a>, as well as ample proprietary code, to enable both real-time and batch processing of user data. This lets users model customer profiles based on historical data, but also adjust those models in reaction to real-time activity on the site.</p>
<p>Here’s how Kimball describes the problems WibiData addresses:</p>
<blockquote><p>Data about users has challenges associated with it that you don’t necessarily see with other large-scale data.</p>
<ul><li>To analyze users, you need to digest large volumes of log-oriented transactional data as well as more concise profile data</li>
<li>You need to serve recommendations and other derived data interactively</li>
<li>There’s a mix of batch (offline) and on-the-fly calculations required to deliver recommendations at web speed</li>
</ul><p>WibiData is designed to store this transactional data side-by-side with profile and other derived data attributes. Keeping data logically and physically close enables high-performance analysis of the entire data picture surrounding a user.</p></blockquote>
<div id="attachment_481579" class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/02/wibi-arch-before1.jpg"><img title="wibi-arch-before1" src="http://gigaom2.files.wordpress.com/2012/02/wibi-arch-before1.jpg?w=300&#038;h=240" alt="" width="300" height="240" class="size-medium wp-image-481579"></a><p class="wp-caption-text">FoneDoktor, without Wibidata</p></div>
<p>In some cases, as FoneDoktor’s Alex Loddengaard <a href="http://www.wibidata.com/2011/12/05/fonedoktor-a-wibidata-application/">explained in a December blog post</a>, WibiData can obviate the need to maintain a Hadoop cluster <em>and</em> a separate online transaction processing system (OLTP) because WibiData provides both capabilities. It does this by using HBase as the real-time data store for transactions, and by incorporating a programming framework that’s abstracted from MapReduce so users can perform either batch or real-time analyses.</p>
<p>Where Avro comes in is for adding fields to data records, or adjusting schema, without affecting existing processes that have to access that data. As Kimball explains, “Does your web site track a new cookie? This can be added as a new field. But even though you start collecting that new data, your existing analysis pipelines can treat records like they always did; programs that don’t yet know about the new cookie are still compatible with both the old records already collected, and the new records with the additional field.”</p>
<p>Its data-management methods and machine-learning libraries for capabilities such as content recommendation make WibiData ideal for web-user data, but Kimball points out it’s also a good fit for “mobile, online gaming, healthcare, finance, and several other industries.” However, WibiData is just one of many startups <a href="http://gigaom.com/cloud/5-low-profile-startups-that-could-change-the-face-of-big-data/">looking to parlay its founders’ Hadoop expertise into a higher-level analytics product</a> that does things Hadoop alone without requiring deep Hadoop or analytics expertise on the customer end. Good thing there’s plenty of data to go around.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=481558&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=489176"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=489176" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481558+hadoop-startup-wibidata-raises-5m-to-power-web-analytics&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481558+hadoop-startup-wibidata-raises-5m-to-power-web-analytics&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481558+hadoop-startup-wibidata-raises-5m-to-power-web-analytics&utm_content=dharrisstructure">Sector RoadMap: SQL-on-Hadoop platforms in 2013</a></li><li><a href="http://pro.gigaom.com/2012/07/scaling-hadoop-clusters-the-role-of-cluster-management/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481558+hadoop-startup-wibidata-raises-5m-to-power-web-analytics&utm_content=dharrisstructure">Scaling Hadoop clusters: the role of cluster management</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/02/07/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/02/env-ecosystem-web3.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/02/env-ecosystem-web3.jpg?w=150" medium="image">
			<media:title type="html">env-ecosystem-web</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/02/env-ecosystem-web3.jpg?w=300" medium="image">
			<media:title type="html">env-ecosystem-web</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/02/wibi-arch-before1.jpg?w=300" medium="image">
			<media:title type="html">wibi-arch-before1</media:title>
		</media:content>
	</item>
		<item>
		<title>What it really means when someone says &#8216;Hadoop&#8217;</title>
		<link>http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/</link>
		<comments>http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 20:12:12 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Datameer]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[karmasphere]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[zettaset]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=481182</guid>
		<description><![CDATA[Hadoop features front and center in the discussion of how to implement a big data strategy, one of the biggest trends in IT. There’s just one problem that keeps cropping up: many people don’t seem to know exactly what it means when somebody says “Hadoop.”<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=481182&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/10/hadoop1.jpg"><img title="hadoop" src="http://gigaom2.files.wordpress.com/2011/10/hadoop1.jpg?w=708" alt=""   class="alignleft size-full wp-image-426524"></a>Big data is among the hottest trends in IT right now, and Hadoop stands front and center in the discussion of how to implement a big data strategy. There’s just one problem that keeps cropping up: many people don’t seem to know exactly what it means when somebody says “Hadoop.”</p>
<p>The problem surfaced again Monday in the form of complaints over Forrester’s new report titled <a href="http://www.forrester.com/rb/Research/wave%26trade%3B_enterprise_hadoop_solutions%2C_q1_2012/q/id/60755/t/2?src=RSS_2&amp;cm_mmc=Forrester-_-RSS-_-Document-_-6">“Enterprise Hadoop Solution, Q1 2012.”</a><em> InformationWeek </em><a href="http://informationweek.com/news/software/info_management/232600283">spoke with a few vendors</a> that didn’t like how their products were assessed, and database industry analyst Curt Monash <a href="http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions">says the report “compares apples, peaches, almonds, and peanuts.”</a> I thought the same thing when I saw a copy of the report last week. They all focus on Hadoop, but Hortonworks is not Datameer is not HStreaming.</p>
<p>Allow me to explain. Hopefully, this provides a foundation for parsing what people talk about when they talk about Hadoop, and for differentiating one type of product from another. (And you can learn even more about Hadoop and how it’s used at our <a href="http://event.gigaom.com/structuredata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=481182+what-it-really-means-when-someone-says-hadoop&amp;utm_content=dharrisstructure">Structure: Data</a> conference taking place next month in New York City.)</p>
<h2>What Hadoop is</h2>
<p>I went into this in more detail in a <a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=481182+what-it-really-means-when-someone-says-hadoop&amp;utm_content=dharrisstructure">GigaOM Pro report published last March</a> (<strong>sub req’d</strong>), but the long and short is that Hadoop is, at its core, an <a href="http://hadoop.apache.org/">Apache Software Foundation project</a> consisting of two primary subprojects — <a href="http://hadoop.apache.org/mapreduce/">Hadoop MapReduce</a> and the <a href="http://hadoop.apache.org/hdfs/">Hadoop Distributed File System</a>. MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets in relatively short order. HDFS is the distributed file system that lets Hadoop scale across commodity servers and, importantly, store data on the compute nodes in order to boost performance (and potentially save money). These are the two must-have components for any Hadoop distribution.</p>
<p>There are also a number of Apache projects related to Hadoop, often built atop either Hadoop MapReduce or HDFS. These include — but are not limited to — <a href="http://hive.apache.org/">Hive</a> and <a href="http://pig.apache.org/">Pig</a>, two SQL-like query languages to provide data-warehouse-like capabilities to a Hadoop cluster, and <a href="http://hbase.apache.org/">HBase</a>, a NoSQL database that leverages HDFS as its distributed storage engine.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/02/hadoop-projects.jpg"><img title="hadoop projects" src="http://gigaom2.files.wordpress.com/2012/02/hadoop-projects.jpg?w=604&#038;h=198" alt="" width="604" height="198" class="aligncenter size-large wp-image-481309"></a></p>
<h2>Hadoop distributions</h2>
<p>These are packaged software products that aim to ease deployment and management of Hadoop clusters compared with simply downloading the various Apache code bases and trying to cobble together a system. Presently, <a href="http://gigaom.com/cloud/why-cloudera-isnt-sweating-the-hadoop-competition/">Cloudera</a>, <a href="http://gigaom.com/cloud/yahoo-spinoff-shakes-up-hadoop-market-with-new-distro/">Hortonworks</a>, <a href="http://gigaom.com/cloud/battle-on-mapr-cloudera-pimp-their-version-of-hadoop/">MapR</a> and <a href="http://gigaom.com/cloud/emc-throws-lots-of-hardware-at-hadoop/">EMC</a>  all offer their own Hadoop distributions. Although they’re all unique — sometimes very unique, as with MapR’s proprietary file system — they all package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a way that in theory makes them integrate more naturally, and to run both smoothly and securely.</p>
<p>Many Hadoop distributions integrate with various data warehouses, databases and other data-management products, with the goal of moving data between Hadoop clusters and other environments so each might process or query data stored in the other.</p>
<h2>Hadoop management software</h2>
<p>Just as the wording implies, Hadoop management software is designed to make it easier to manage and troubleshoot a Hadoop cluster. Such products are usually sold or offered by companies peddling Hadoop distributions, because even when commercially packaged, Hadoop is still a complex architecture and somewhat foreign to most IT personnel and products. However, third parties such as <a href="http://gigaom.com/cloud/platform-computing-extends-hpc-reach-into-mapreduce/">Platform Computing</a> (now <a href="http://gigaom.com/cloud/ibm-eyes-big-data-at-big-banks-with-platform-buy/">part of IBM</a>) and <a href="http://gigaom.com/cloud/zettaset-raises-3m-for-the-consumerization-of-big-data/">Zettaset</a> also sell software for managing Hadoop clusters, and their products are typically agnostic as to what distributions they support.</p>
<p>But distributions and management software are all about the infrastructure and the platform. Anyone actually wanting to use Hadoop still needs to know how to write applications that leverage the underlying architecture.</p>
<h2>Hadoop application software (or, products that use Hadoop)</h2>
<p>The Hadoop ecosystem gets really complex when we start looking at products that exist to help developers write Hadoop applications or otherwise analyze data stored within Hadoop in a manner other than writing traditional MapReduce jobs. These range from abstraction layers such as <a href="http://karmasphere.com/index.php">Karmasphere Analyst</a> or <a href="http://gigaom.com/cloud/ibms-hadoop-effort-grows-from-project-to-product/">IBM Infosphere BigInsights</a>, to <a href="http://gigaom.com/cloud/hadapt-raises-9-5m-for-hadoop-data-warehouse/">Hadapt</a>, which offers a single-platform product fusing a SQL data warehouse with a Hadoop cluster, to <a href="http://www.hstreaming.com/">HStreaming</a>, which promises real-time processing and analytics.</p>
<p>The one common thing among all these products, however, is that they are not Hadoop distributions, but sit atop platform software from Hortonworks, EMC or whomever. Some products that get thrown into the Hadoop fray, such as <a href="http://outerthought.org/site/products/lily.html">Outerthought Lily</a> or <a href="http://drawntoscale.com/how_it_works.html">Drawn to Scale Spire</a>, are essentially scale-out databases built atop HBase (which itself is a separate project built atop HDFS). The image below, from Karmasphere, gives a particularly clear map of how a Hadoop environment might look.</p>
<p><a href="http://gigaom2.files.wordpress.com/2011/06/hadoopdatafabric-ks.jpeg"><img title="HadoopDataFabric-KS" src="http://gigaom2.files.wordpress.com/2011/06/hadoopdatafabric-ks.jpeg?w=604&#038;h=379" alt="" width="604" height="379" class="aligncenter size-large wp-image-369496"></a></p>
<p>The applications and analytics space is probably <a href="http://gigaom.com/cloud/5-low-profile-startups-that-could-change-the-face-of-big-data/">where we’ll see the biggest influx of new companies</a>, as writing Hadoop applications is still tough, but it’s also how companies will actually start experiencing direct business benefits. In fact, it’s these type of higher-level products that are the focal point of <a href="http://gigaom.com/cloud/accel-forms-100m-fund-to-feed-big-data-apps/">Accel Partners’ new big data fund</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=481182&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=939435"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=939435" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/sql-on-hadoop-roadmap-2013/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">Sector RoadMap: SQL-on-Hadoop platforms in 2013</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/10/hadoop-e1319488918182.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/10/hadoop-e1319488918182.jpg?w=150" medium="image">
			<media:title type="html">hadoop</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/10/hadoop1.jpg" medium="image">
			<media:title type="html">hadoop</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/02/hadoop-projects.jpg?w=604" medium="image">
			<media:title type="html">hadoop projects</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/06/hadoopdatafabric-ks.jpeg?w=604" medium="image">
			<media:title type="html">HadoopDataFabric-KS</media:title>
		</media:content>
	</item>
		<item>
		<title>5 low-profile startups that could change the face of big data</title>
		<link>http://gigaom.com/2012/01/28/5-low-profile-startups-that-could-change-the-face-of-big-data/</link>
		<comments>http://gigaom.com/2012/01/28/5-low-profile-startups-that-could-change-the-face-of-big-data/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 23:00:37 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Aaron Kimball]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[Ben Werther]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[Christophe Bisciglia]]></category>
		<category><![CDATA[cloud-infrastructure]]></category>
		<category><![CDATA[Continuuity]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Odiago]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[Skytree]]></category>
		<category><![CDATA[Todd Papaioannou]]></category>
		<category><![CDATA[web analytics]]></category>
		<category><![CDATA[WibiData]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=477011</guid>
		<description><![CDATA[The great thing about big data is that there's still plenty of room for new blood, especially for companies that want to leave infrastructure in the rearview mirror. At this point, the data-infrastructure space, including Hadoop, is well-funded and nearly saturated, but it also needs help.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=477011&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/01/visual1.jpg"><img title="visual" src="http://gigaom2.files.wordpress.com/2012/01/visual1.jpg?w=300&#038;h=225" alt="" width="300" height="225" class="alignleft size-medium wp-image-477233"></a></p>
<p>Big data is hot, but infrastructure-level platforms such as Hadoop, which focus on storage and processing, still need help to take them into the mainstream. They need a killer app or two that will let companies analyze, visualize and act on all that data without hiring a team of Stanford Ph.Ds, or that will let developers write big-data apps without having to reinvent the wheel.</p>
<p>Here are five startups (in alphabetical order) either in stealth mode or just out of it that could help take Hadoop and its ilk to the promised land.</p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2012/01/logo-1.jpg"><img title="logo (1)" src="http://gigaom2.files.wordpress.com/2012/01/logo-1.jpg?w=708" alt=""   class="alignleft size-full wp-image-477216"></a>1. BloomReach</strong></p>
<p>The stealth-mode <a href="http://www.bloomreach.com/">BloomReach</a> is taking a very targeted, very hands-free approach to big data for its customers. It’s offering a SaaS-based product that <a href="http://startupers.com/search/node/bloomreach">job listings</a> say is for “helping leading online businesses uncover the highest quality, most relevant content sought by their consumers, when and where they want it.” Founded by a team with roots at Google, Cisco, Facebook and Yahoo, among other companies, BloomReach has, <a href="http://searchquant.blogspot.com/2011/11/seo-platform-wars-bloomreach-brightedge.html">according to one estimate</a>, about 160 customers — all of them among the top 10,000 websites, and most of them in the retail space. Among its core technologies and methods are Hadoop, Lucene, Monte Carlo simulations and large-scale image processing.</p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2012/01/continuuity1.jpg"><img title="continuuity" src="http://gigaom2.files.wordpress.com/2012/01/continuuity1.jpg?w=210&#038;h=43" alt="" width="210" height="43" class="alignleft size-thumbnail wp-image-477218"></a>2. Continuuity</strong></p>
<p><a href="http://continuuity.com">Continuuity</a>, the <a href="http://gigaom.com/cloud/ex-yahoo-cloud-chief-gets-2-5m-for-stealthy-data-startup/">just-launched stealth-mode startup</a> by former Yahoo VP and chief cloud architect Todd Papaioannou, wants to make it easier to build applications that can leverage both cloud computing and big data technologies. As Papaioannou told me recently, most developers shouldn’t have to go through what Yahoo, Facebook and others did in order to write large-scale, data-driven applications. He also said “the data fabric is the next middleware” and noted that the company name is a play on “continuum.” You figure out what it’s up to.</p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2012/01/odiago.jpg"><img title="odiago" src="http://gigaom2.files.wordpress.com/2012/01/odiago.jpg?w=210&#038;h=70" alt="" width="210" height="70" class="alignleft size-thumbnail wp-image-477219"></a>3. Odiago</strong></p>
<p><a href="http://odiago.com">Odiago</a> is the brainchild of Hadoop and analytics experts Christophe Bisciglia and Aaron Kimball, and <a href="http://gigaom.com/cloud/below-the-surface-of-cloudera-founders-new-project/">aims to improve the state of web analytics</a>. Its first product, <a href="http://wibidata.com">Wibidata</a>, which is in private beta, lets websites better analyze their user data to build more-targeted features. It’s built atop Hadoop and HBase, but also plugs into companies’ existing data-management and BI tools. Current customers include Wikipedia, RichRelevance, FoneDoktor and Atlassian (with whom it shares office space).</p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2012/01/new-logo.jpg"><img title="new-logo" src="http://gigaom2.files.wordpress.com/2012/01/new-logo.jpg?w=708" alt=""   class="alignleft size-full wp-image-477220"></a>4. Platfora</strong></p>
<p><a href="http://platfora.com">Platfora</a>, which <a href="http://gigaom.com/cloud/platfora-gets-5-7m-to-make-hadoop-mainstream/">launched in September with $5.7 million in funding</a>, wants to make big data analytics accessible to the masses. Founder and CEO Ben Werther, formerly of Greenplum and NoSQL startup DataStax, told me when Platfora launched that its intuitive, visually stunning interface will make Hadoop-based analytics so easy even a history major could use it. Platfora’s product isn’t available yet, but <a href="http://startupers.com/search/node/platfora">the company is currently hiring</a>, with an emphasis on frontend and user-experience skills.</p>
<p><strong><a href="http://gigaom2.files.wordpress.com/2012/01/skytree.jpg"><img title="skytree" src="http://gigaom2.files.wordpress.com/2012/01/skytree.jpg?w=210&#038;h=42" alt="" width="210" height="42" class="alignleft size-thumbnail wp-image-477222"></a>5. SkyTree</strong></p>
<p><a href="http://skytreecorp.com">Skytree</a> is probably the stealthiest of the group, but it’s also is one of the more ambitious — because it’s <a href="http://www.linkedin.com/company/skytree-inc-">trying to bring high-performance machine learning</a> to mainstream companies. Machine learning is an impressive technique in which the system itself gets smarter as it digests more data, but it usually doesn’t find its way out of research environments or cutting-edge analytics teams. Skytree is putting together an impressive team, including co-founder Alexander Gray, who also teaches machine learning at Georgia Tech and spent six years at NASA’s Jet Propulsion Laboratory. The company will officially launch later this quarter.</p>
<p>We’ll be addressing many of the issues these companies are trying to resolve at our <a href="http://event.gigaom.com/structuredata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=477011+5-low-profile-startups-that-could-change-the-face-of-big-data&amp;utm_content=dharrisstructure">Structure: Data</a> event that takes place March 21-22 in New York City. Founders from Continuuity, Odiago and Skytree will be speaking at the event, as will dozens of other data visionaries from companies such as IBM, Google, @WalmartLabs and Hortonworks.</p>
<p><em>Feature image courtesy of <a href="http://www.flickr.com/photos/jurvetson/916142/">Flickr user jurvetson</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=477011&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=798525"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=798525" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=477011+5-low-profile-startups-that-could-change-the-face-of-big-data&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=477011+5-low-profile-startups-that-could-change-the-face-of-big-data&utm_content=dharrisstructure">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=477011+5-low-profile-startups-that-could-change-the-face-of-big-data&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/report/the-new-economics-of-enterprise-data-warehousing/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=477011+5-low-profile-startups-that-could-change-the-face-of-big-data&utm_content=dharrisstructure">How data warehousing is now a cost-effective solution for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/01/28/5-low-profile-startups-that-could-change-the-face-of-big-data/feed/</wfw:commentRss>
		<slash:comments>37</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/01/visual1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/01/visual1.jpg?w=150" medium="image">
			<media:title type="html">visual</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/01/visual1.jpg?w=300" medium="image">
			<media:title type="html">visual</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/01/logo-1.jpg" medium="image">
			<media:title type="html">logo (1)</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/01/continuuity1.jpg?w=210" medium="image">
			<media:title type="html">continuuity</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/01/odiago.jpg?w=210" medium="image">
			<media:title type="html">odiago</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/01/new-logo.jpg" medium="image">
			<media:title type="html">new-logo</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/01/skytree.jpg?w=210" medium="image">
			<media:title type="html">skytree</media:title>
		</media:content>
	</item>
	</channel>
</rss>
