<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; HDFS</title>
	<atom:link href="http://gigaom.com/tag/hdfs/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Wed, 22 May 2013 00:48:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; HDFS</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Quantcast releases bigger, faster, stronger Hadoop file system</title>
		<link>http://gigaom.com/2012/09/27/quantcast-releases-bigger-faster-stronger-hadoop-file-system/</link>
		<comments>http://gigaom.com/2012/09/27/quantcast-releases-bigger-faster-stronger-hadoop-file-system/#comments</comments>
		<pubDate>Thu, 27 Sep 2012 13:38:34 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Quantcast]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[webscale]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=567220</guid>
		<description><![CDATA[It's not for everyone, but if you're storing petabytes of data Hadoop, Quantcast thinks it has the cure to your woes. Its newly open sourced Quantcast File System promises smaller clusters and better performance, and it has proven itself over exabytes of data inside Quantcast.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=567220&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The Quantcast File System is like the Six-Million Dollar Man of distributed data stores for Hadoop. An implementation of the <a href="http://en.wikipedia.org/wiki/CloudStore">Kosmix Distributed File System (aka CloudStore)</a> that had largely been written off and forgotten, <a href="http://quantcast.com">Quantcast</a> has built QFS to be bigger, faster and stronger than the Hadoop Distributed File System most commonly associated with the popular big data platform. Now, QFS <a href="http://quantcast.github.com/qfs">is open source and ready for use</a> in the webscale world.</p>
<p>According to Quantcast VP of Research and Development Jim Kelly, the web-audience measurement specialist began working with Hadoop in 2006 and experienced problems almost from the start. However, while the early problems with HDFS might have been symptoms of its immaturity, the problems soon began centering around the two things Hadoop is supposed to be best at &#8212; size and speed. So, in 2008, Quantcast began experimenting with, and actually sponsoring, the Kosmix project.</p>
<p>It turns out that wasn&#8217;t a moment too soon. By 2010, after Quantcast began integrating with ad networks, its data flow really began picking up into the tens of terabytes a day range. It turned on QFS as its production Hadoop file system in 2011 and now receives about 40TB a day and processes a whopping 20 petabytes. Kelly said Quantcast has pushed 4 exabytes &#8212; or 4 billion gigabytes &#8212; through QFS since turning it on.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/09/qfs.jpg"><img  title="qfs" src="http://gigaom2.files.wordpress.com/2012/09/qfs.jpg?w=604&#038;h=296" alt="" width="604" height="296" class="aligncenter size-large wp-image-567289" /></a></p>
<h2>Faster, yes. Bigger, not so much.</h2>
<p>At Quantcast&#8217;s scale, the problem with HDFS wasn&#8217;t so much its scalability, but the sheer size of the cluster required to handle petabyte-scale data stores. HDFS stores three copies of each piece of data to ensure they&#8217;re always available, although it tries to make up for the size issue with data locality (i.e., putting data directly on the computing nodes so it doesn&#8217;t have to traverse the network in order to be processed). Kelly thinks those techniques are relics of a bygone era.</p>
<p>&#8220;When HDFS [was created], disk drives and networks were tied for being the slowest things in the cluster,&#8221; he said.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/09/qfs-2.jpg"><img  title="QFS 2" src="http://gigaom2.files.wordpress.com/2012/09/qfs-2.jpg?w=300&#038;h=206" alt="" width="300" height="206" class="alignright size-medium wp-image-567292" /></a>Enter <a href="http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction">Reed-Solomon error correction</a>, QFS&#8217;s chosen method for assuring reliable access to data that Kelly says actually ends up shrinking the size of Hadoop clusters while improving their performance. (It&#8217;s actually the same method used on CDs and DVDs.) Rather than storing three full versions of each file like HDFS, resulting in the need for three times more storage, QFS only needs 1.5x the raw capacity because it stripes data across nine different disk drives. Quantcast believes smaller cluster size, combined with today&#8217;s 10-gigabit networks and the ability to read and write data in parallel make QFS significantly faster than HDFS at large scale.</p>
<p>QFS also comes equipped with other features that Quantcast had to implement to make it production-ready. Among them: it is written in C++ and has fixed-footprint memory management; it has access control based on users and groups; and it intelligently detects node failures, as opposed to planned maintenance, and invokes data recovery accordingly.</p>
<h2>It&#8217;s not for everyone, though</h2>
<p>Despite its claimed improvements over HDFS though, Kelly is quick to point out that QFS is probably not the best choice for everyone. It&#8217;s really designed for Hadoop users operating at petabyte scale, who have the technical prowess to handle a migration away from HDFS, and for whom data-processing costs are hitting the six-to-seven-figure range monthly once things such as energy bills accounted for.</p>
<p>&#8220;If you&#8217;re cluster only has 10 disk drives,&#8221; Kelly said, &#8220;[QFS] will save you $500, which is nice but &#8230;&#8221;</p>
<p>Likewise, if high availability is very important, the <a href="http://hadoop.apache.org/releases.html#17+September%2C+2012%3A+Release+0.23.3+available">latest version of HDFS</a> might be preferable. &#8220;There&#8217;s a standby [in QFS]; it&#8217;s not quite as hot as theirs,&#8221; Kelly said. But availability isn&#8217;t super important to Quantcast, he said, it hasn&#8217;t had any real problems with QFS going down anyhow. When it does, it actually recovers pretty fast.</p>
<p>As for the <a href="http://gigaom.com/cloud/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/">rest of the file systems touting themselves as better alternatives for HDFS</a>, Kelly didn&#8217;t have much to say. Quantcast&#8217;s efforts are focused on mega-scale Hadoop deployments, and he doesn&#8217;t see anything better for that use case. Although, he noted, Hadoop vendors probably shouldn&#8217;t get too upset over all the competition.</p>
<p>&#8220;I think some diversity in the ecosystem is probably not a bad thing,&#8221; he said, &#8220;and is probably a sign of healthy evolution.&#8221;</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-244369p1.html">Shutterstock user Lobke Peers</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=567220&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=147624"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=147624" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=567220+quantcast-releases-bigger-faster-stronger-hadoop-file-system&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=567220+quantcast-releases-bigger-faster-stronger-hadoop-file-system&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=567220+quantcast-releases-bigger-faster-stronger-hadoop-file-system&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=567220+quantcast-releases-bigger-faster-stronger-hadoop-file-system&utm_content=dharrisstructure">How search can unlock the power of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/27/quantcast-releases-bigger-faster-stronger-hadoop-file-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_55825522.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_55825522.jpg?w=150" medium="image">
			<media:title type="html">cyborg</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/qfs.jpg?w=604" medium="image">
			<media:title type="html">qfs</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/qfs-2.jpg?w=300" medium="image">
			<media:title type="html">QFS 2</media:title>
		</media:content>
	</item>
		<item>
		<title>How 0xdata wants to help everyone become data scientists</title>
		<link>http://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/</link>
		<comments>http://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/#comments</comments>
		<pubDate>Tue, 14 Aug 2012 19:45:04 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[0xdata]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=552457</guid>
		<description><![CDATA[Although it's still a work in progress, 0xdata thinks it has the answer to the problem of doing advanced statistical analysis at scale: Build on HDFS for scale, use the widely known R programming language and hide it all under a simple interface.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=552457&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There&#8217;s a trend afoot in the big data space <a href="http://gigaom.com/cloud/want-to-ditch-your-data-scientists-heres-are-7-startups-that-can-help/">to turn data science from black magic into child&#8217;s play</a>, and one of the newest companies trying to pull of this technological alchemy is <a href="http://www.0xdata.com/index.html">0xdata</a>. The bootstrapped startup, pronounced &#8220;hexadata,&#8221; is the brainchild of former DataStax engineer, and Platfora co-founder, SriSatish Ambati, and it&#8217;s trying to blend Hadoop, R and Google BigQuery into the ultimate tool for statistical analysis. Scientists, data analysts or whoever ultimately uses the product only need to be experts in their domains, not in statistics.</p>
<p>At its core, <a href="http://www.0xdata.com/faq.html">oxdata&#8217;s flagship product, called H2O</a>, is a statistical analysis engine that uses the Hadoop Distributed File System (HDFS) as its storage platform, but the goal is to make it <a href="http://gigaom.com/cloud/google-opens-up-its-biq-query-data-analytics-service-to-all/">as simple as using a Google service such as BigQuery</a>. Users will interact with H2O via a simple web-search-like bar and standard <a href="http://www.r-project.org/">R statistical-analysis</a> syntax, but H2O will run machine-learning algorithms behind the scenes. Alternatively, users can call out to H2O from Microsoft Excel or the <a href="http://rstudio.org/">RStudio</a> integrated development environment using a REST API.</p>
<div id="attachment_552941" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/08/big_banner-copy.jpg"><img  title="big_banner copy" src="http://gigaom2.files.wordpress.com/2012/08/big_banner-copy.jpg?w=300&#038;h=114" alt="" width="300" height="114" class="size-medium wp-image-552941" /></a><p class="wp-caption-text">Although BigQuery is a SQL service hosted by Google, 0xdata follows a similar theory on simplicity.</p></div>
<p>However they choose to leverage the product, Ambati said, the scale of the underlying data and the complexity of running advanced analysis are details that need to be hidden. It&#8217;s the same theory that underlies Platfora, the company Ambati co-founded last year with his former DataStax colleague Ben Werther, although their approaches appear to be different. Whereas Platfora is <a href="http://gigaom.com/cloud/platfora-gets-5-7m-to-make-hadoop-mainstream/">trying to disrupt the data warehouse market</a> by building a next-generation user experience atop Hadoop, 0xdata is trying to change the way users interact with popular statistical software such as R.</p>
<p>But either way, Ambati says of new data-analysis products, &#8220;[There are] no bragging rights for making it simple. If you don&#8217;t do that, you won&#8217;t be able to go forward.&#8221;</p>
<p>oxdata is also putting a focus on speed, both in terms of how fast it processes data and how fast it lets users react. Google search changed our thinking around how many questions people can ask successively, Ambati explained, and data analysts should have the same experience. That&#8217;s why H2O provides approximate results at every step in the analysis process. Rather than wait for the entire job to run and the exact results to be computed, users can get a general idea of results and kill the job and start over quicker if they&#8217;re completely outside the expected range.</p>
<p>But it will be a while before the public gets a chance to see whether H2O lives up to its promises. Ambati said the product is just four months into development and won&#8217;t have its first set of algorithms available for another few months. His team of eight engineers has &#8220;built a lot of cool stuff,&#8221; but now it needs to round out the process and turn its code for H2O into an actual product.</p>
<p>Still, having decided to tackle data as a system, Ambati and his team are having a lot of fun. &#8220;We are live-and-die-with-infrastructure people,&#8221; he said, but for a bunch of folks who spent a lot of time learning math, it&#8217;s like going back to the their days as computer science students.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-11418p1.html">Shutterstock user Bruce Rolff</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=552457&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=926814"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=926814" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=552457+how-0xdata-wants-to-help-everyone-become-data-scientists&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/shutterstock_107081264-e1344971009541.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/shutterstock_107081264-e1344971009541.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_107081264</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/big_banner-copy.jpg?w=300" medium="image">
			<media:title type="html">big_banner copy</media:title>
		</media:content>
	</item>
		<item>
		<title>Troll sues Facebook, Amazon and others for using Hadoop</title>
		<link>http://gigaom.com/2012/07/13/troll-sues-facebook-amazon-and-others-for-using-hadoop/</link>
		<comments>http://gigaom.com/2012/07/13/troll-sues-facebook-amazon-and-others-for-using-hadoop/#comments</comments>
		<pubDate>Fri, 13 Jul 2012 21:21:08 +0000</pubDate>
		<dc:creator>Jeff John Roberts</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop Distributed File System]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[parallel iron]]></category>
		<category><![CDATA[parallel iron llc]]></category>
		<category><![CDATA[patent trolls]]></category>
		<category><![CDATA[troll]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=542570</guid>
		<description><![CDATA[Big data has become the latest front for the patent troll epidemic as a shell company is suing firms for using a common software framework known as the Hadoop Distributed File System (HDFS).<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=542570&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom.com/mobile/were-all-trolls-now-why-the-patent-rats-nest-is-worse-than-you-think/troll-2/" rel="attachment wp-att-528156"><img  title="Troll" src="http://gigaom2.files.wordpress.com/2012/06/troll.jpg?w=140&#038;h=140" alt="" width="140" height="140" class="alignleft size-thumbnail wp-image-528156" /></a>Big data has become the latest front for the patent troll epidemic as a shell company is suing firms for using a common open-source storage framework known as the Hadoop Distributed File System (HDFS).</p>
<p>In complaints filed this week, a Delaware-based shell called Parallel Iron claims Facebook and LinkedIn violated its patents by using HDFS. Parallel Iron has already filed suits against Amazon, Oracle and other firms for using the HDFS technology that lets users store huge quantities of data on clusters of commodity servers.</p>
<p>Hadoop has been built by a large network of contributors, including individual developers and large companies like Yahoo and is an Apache Software Foundation project. HDFS, its storage component, was based on Google&#8217;s Google File System. Parallel Iron&#8217;s patent complaints, however, say the whole system was made possible by four men:</p>
<blockquote><p>In this technological age, we take for granted the ability to access tremendous amounts of data through our computers and the Internet, a process that seems effortless and unremarkable. <strong>But this apparent effortlessness is an illusion, made possible only by technological wizardry. &#8230;  It was made possible by the innovations of technological pioneers</strong> like Melvin James Bullen, Steven Louis Dodd, William Thomas Lynch, and David James Herbison.</p></blockquote>
<p>The four men obtained three patents for &#8220;methods and systems for a storage system&#8221; in <a href="http://www.google.com/patents/US7197662">2007</a>, <a href="http://www.google.com/patents/US75431http://www.google.com/patents/US7543177">2009</a> and <a href="http://www.google.com/patents/US7958388?dq=7,958,388&amp;ei=UIsAUNvAF-io0AHR15zJBw">2011</a> (click the dates to see them). They assigned the patents to an LLC called Ring Technology Enterprises which appears to have been a predecessor shell company to Parallel Iron that <a href="http://www.tgdaily.com/business-and-law-features/45063-industry-giants-sued-over-memory-patent">filed lawsuits</a> in the Eastern District of Texas.</p>
<p>Such companies, known as patent trolls, have come under fire as critics accuse of them gaming the patent system in order to extort money from companies that create real products and services. Recently, trolls have begun stalking promising start-ups (like travel site <a href="http://gigaom.com/2012/07/05/patent-troll-stalks-travel-site-hipmunk/">Hipmunk</a> and handcraft marketplace Etsy) and suing them as soon they receive funding.</p>
<p>The toll of patent trolls on innovation has received special attention this month as two academics <a href="http://arstechnica.com/tech-policy/2012/07/new-study-same-authors-patent-trolls-cost-economy-29-billion-yearly/">released a study</a> concluding that trolls cost the economy $29 billion in direct costs every year. Meanwhile, a famous judge called the <a href="http://gigaom.com/mobile/famous-judge-spikes-apple-google-case-calls-patent-system-dysfunctional/">patent system &#8220;dysfunctional&#8221;</a> and threw out a long-awaited smartphone trial between Google and Apple. This week, the same judge wrote an <a href="http://www.theatlantic.com/business/archive/2012/07/why-there-are-too-many-patents-in-america/259725/">editorial in the <em>Atlantic</em></a> saying many industries don&#8217;t need patents in the first place and suggesting that trolls should have to actually use the patents that are the basis of their lawsuits.</p>
<p>As Congress has been slow in fixing the patent troll problem, some companies like Twitter are exploring their own solutions like pledging <a href="http://paidcontent.org/2012/04/17/twitter-promotes-patent-peace-with-innovators-agreement/">not to use their patents</a> in an offensive manner.</p>
<p>Parallel Iron&#8217;s attorney, <a href="http://www.bayardlaw.com/richard-kirk">Richard Kirk</a>, did not immediately return a request for comment.</p>
<p>Here&#8217;s a copy of the complaint against Facebook:</p>
<p><a style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;" title="View Parallel Iron v Facebook on Scribd" href="http://www.scribd.com/doc/100026211/Parallel-Iron-v-Facebook">Parallel Iron v Facebook</a><iframe id="doc_83323" src="http://www.scribd.com/embeds/100026211/content?start_page=1&amp;view_mode=list&amp;access_key=key-33jw2hf2qb5hj6y278d" frameborder="0" scrolling="no" width="100%" height="600" data-auto-height="true" data-aspect-ratio="0.772727272727273"></iframe></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=542570&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=121534"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=121534" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=542570+troll-sues-facebook-amazon-and-others-for-using-hadoop&utm_content=jeffjohnroberts">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=542570+troll-sues-facebook-amazon-and-others-for-using-hadoop&utm_content=jeffjohnroberts">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/01/how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=542570+troll-sues-facebook-amazon-and-others-for-using-hadoop&utm_content=jeffjohnroberts">Amazon’s DynamoDB: rattling the cloud market</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=tech&utm_medium=editorial&utm_campaign=auto3&utm_term=542570+troll-sues-facebook-amazon-and-others-for-using-hadoop&utm_content=jeffjohnroberts">How search can unlock the power of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/13/troll-sues-facebook-amazon-and-others-for-using-hadoop/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/troll.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/troll.jpg?w=150" medium="image">
			<media:title type="html">Troll</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/05dfcf765f1554b08954bb9e1ee63363?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">jeffjohnroberts</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/troll.jpg?w=140" medium="image">
			<media:title type="html">Troll</media:title>
		</media:content>
	</item>
		<item>
		<title>Because Hadoop isn&#8217;t perfect: 8 ways to replace HDFS</title>
		<link>http://gigaom.com/2012/07/11/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/</link>
		<comments>http://gigaom.com/2012/07/11/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/#comments</comments>
		<pubDate>Wed, 11 Jul 2012 21:50:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[appistry]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Ceph]]></category>
		<category><![CDATA[CleverSafe]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[file systems]]></category>
		<category><![CDATA[GPFS]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Isilon]]></category>
		<category><![CDATA[Lustre]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[NetApp]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=541225</guid>
		<description><![CDATA[Hadoop is on its way to becomig the de facto platform for the next-generation of data-based applications, but it's not without some flaws. Ironically, one of Hadoop's biggest shortcomings right now is also one of its biggest strengths going forward -- the Hadoop Distributed File System.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=541225&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/07/achilles_heel.jpg"><img  title="achilles heel" src="http://gigaom2.files.wordpress.com/2012/07/shutterstock_16533076.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="alignleft size-medium wp-image-541764" /></a>Hadoop is <a href="http://gigaom.com/cloud/the-state-of-hadoop-strong-and-poised-to-explode/">on its way to becoming the de facto platform</a> for the next-generation of data-based applications, but it&#8217;s not without flaws. Ironically, one of Hadoop&#8217;s biggest shortcomings now is also one of its biggest strengths going forward &#8212; the Hadoop Distributed File System.</p>
<p>Within the Apache Software Foundation, HDFS is always improving in terms of performance and availability. Honestly, it&#8217;s probably fine for the majority of Hadoop workloads that are running in pilot projects, skunkworks projects or generally non-demanding environments. And technologies such as HBase that are built atop HDFS speak to its versatility <a href="http://gigaom.com/cloud/drawn-to-scale-raises-money-to-make-sql-big-data-ready/">as storage system even for non-MapReduce applications</a>.</p>
<p>But if the growing number of options for replacing HDFS signifies anything, it&#8217;s that HDFS isn&#8217;t quite where it needs to be. Some Hadoop users have strict demands around performance, availability and enterprise-grade features, while others aren&#8217;t keen of its direct-attached storage (DAS) architecture. Concerns around availability might be especially valid for anyone (read &#8220;almost everyone&#8221;) who&#8217;s using an older version of Hadoop without the <a href="http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/">High Availability NameNode</a>. Here are eight products and projects whose proprietors argue can deliver what HDFS can&#8217;t:</p>
<p><strong>Cassandra (DataStax)<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/datastax_marketecture_a1-copy.jpg"><img  title="datastax_marketecture_A1 copy" src="http://gigaom2.files.wordpress.com/2012/07/datastax_marketecture_a1-copy.jpg?w=300&#038;h=263" alt="" width="300" height="263" class="alignright size-medium wp-image-541752" /></a>Not a file system at all but an open source, NoSQL key-value store, Cassandra has become a viable alternative to HDFS for web applications that rely on fast data access. <a href="http://www.datastax.com">DataStax</a>, a startup commercializing the Cassandra database, has <a href="http://gigaom.com/cloud/datastax-gets-11m-fuses-nosql-and-hadoop/">fused Hadoop atop Cassandra</a> to provide web applications fast access to data processed by Hadoop, and Hadoop fast access to data streaming into Cassandra from web users.</p>
<p><strong>Ceph<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/stack-copy.jpg"><img  title="stack copy" src="http://gigaom2.files.wordpress.com/2012/07/stack-copy.jpg?w=300&#038;h=279" alt="" width="300" height="279" class="alignright size-medium wp-image-541758" /></a>Ceph is an open source, multi-pronged storage system that was recently <a href="http://gigaom.com/cloud/inktank-launches-to-change-the-face-of-open-source-storage/"> commercialized by a startup called Inktank</a>. Among its features is a high-performance parallel file system that <a href="http://www.itworld.com/big-datahadoop/262612/ceph-extends-storage-open-scalability">some think makes it a candidate for replacing HDFS</a> (and then some) in Hadoop environments. Indeed, some researchers started <a href="www.soe.ucsc.edu/~carlosm/Papers/eestolan-nsdi10-abstract.pdf">looking at this possibility as far back as 2010</a>.</p>
<p><strong>Dispersed Storage Network (Cleversafe)<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/object-based-access-methods.gif"><img  title="object-based-access-methods" src="http://gigaom2.files.wordpress.com/2012/07/object-based-access-methods.gif?w=300&#038;h=208" alt="" width="300" height="208" class="alignright size-medium wp-image-541757" /></a>Cleversafe <a href="http://www.cleversafe.com/press-releases/cleversafe-first-to-deliver-breakthrough-capabilities-for-combined-storage-and-massive-computation">got into the HDFS-replacement business on Monday</a>, announcing a product that will fuse Hadoop MapReduce with the company&#8217;s Dispersed Storage Network system. By fully distributing metadata across the cluster (instead of relying on a single NameNode) and not relying on replication, Cleversafe says it&#8217;s much faster, more reliable and scalable than HDFS.</p>
<p><strong>GPFS (IBM)<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/gpfs.jpg"><img  title="gpfs" src="http://gigaom2.files.wordpress.com/2012/07/gpfs.jpg?w=300&#038;h=135" alt="" width="300" height="135" class="alignright size-medium wp-image-541756" /></a>IBM has been selling its General Parallel File System to high-performance computing customers for years (including within some of the world&#8217;s fastest supercomputers), and in 2010 it <a href="http://database-diary.com/2011/11/30/comparing-hdfs-and-gpfs-for-hadoop/">tuned GPFS for Hadoop</a>. IBM claims the GPFS-SNC (Shared Nothing Cluster) edition is so much faster than Hadoop in part because it runs at the kernel level as opposed to atop the OS like HDFS.</p>
<p><strong>Isilon (EMC)<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/isilon-hadoop.jpg"><img  title="isilon hadoop" src="http://gigaom2.files.wordpress.com/2012/07/isilon-hadoop.jpg?w=300&#038;h=199" alt="" width="300" height="199" class="alignright size-medium wp-image-541753" /></a>EMC has offered its own Hadoop distributions for more than a year, but in January 2012 it unveiled a new method for making HDFS enterprise-class &#8212; <a href="http://gigaom.com/cloud/emc-delivers-on-isilon-hadoop-bundle/">replace it with EMC Isilon&#8217;s OneFS file system</a>. Technically, as EMC&#8217;s Chuck Hollis <a href="http://chucksblog.emc.com/chucks_blog/2012/01/hdfs-coming-to-an-array-near-you.html">explained at the time</a>, because Isilon can read NFS, CIFS and HDFS protocols, a single Isilon NAS system can serve to intake, process and analyze data.</p>
<p><strong>Lustre</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/lustre.jpg"><img  title="lustre" src="http://gigaom2.files.wordpress.com/2012/07/lustre.jpg?w=300&#038;h=205" alt="" width="300" height="205" class="alignright size-medium wp-image-541761" /></a><a href="http://wiki.lustre.org/index.php/Main_Page">Lustre</a> is a an open source high-performance file system that some claim can make for an HDFS alternative where performance is a major concern. Truth be told, I haven&#8217;t heard of this combination running anywhere in the wild, but HPC storage provider Xyratex <a href="http://www.xyratex.com/pdfs/whitepapers/Xyratex_white_paper_MapReduce_1-4.pdf">wrote a paper on the combination in 2011</a>, claiming a Lustre-based cluster (even with InfiniBand) will be faster and cheaper than an HDFS-based cluster.</p>
<p><strong>MapR File System<br />
</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/compsol-diag3-1.jpg"><img  title="compsol-diag3-1" src="http://gigaom2.files.wordpress.com/2012/07/compsol-diag3-1.jpg?w=300&#038;h=266" alt="" width="300" height="266" class="alignright size-medium wp-image-541754" /></a>The MapR File System is probably the best-known HDFS alternative, as it&#8217;s the basis of MapR&#8217;s increasingly popular &#8212; <a href="http://gigaom.com/cloud/investors-make-20m-bet-on-mapr-to-win-hadoop-war/">and well-funded</a> &#8212; Hadoop distribution. Not only does MapR claim its file system is two to five times faster than HDFS on average (although, <a href="http://www.mapr.com/products/only-with-mapr/scalable">really, up to 20 times faster</a>), but it has features such as mirroring, snapshots and high availability that enterprise customers love.</p>
<p><strong>NetApp Open Solution for Hadoop</strong></p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/netapp.jpg"><img  title="netapp" src="http://gigaom2.files.wordpress.com/2012/07/netapp.jpg?w=300&#038;h=279" alt="" width="300" height="279" class="alignright size-medium wp-image-541755" /></a>OK, the <a href="http://www.netapp.com/us/solutions/infrastructure/hadoop.html">NetApp Open Solution for Hadoop</a> isn&#8217;t so much an HDFS replacement as it is an HDFS <em>improvement</em>, <a href="http://gigaom.com/cloud/netapp-does-network-attached-hadoop/">according to NetApp and early partner Cloudera</a>. The offering still relies on HDFS, but it reenvisions the physical Hadoop architecture by putting HDFS on a RAID array. This, NetApp claims, means faster, more reliable and more secure Hadoop jobs.</p>
<p>This might be a good place to say rest in peace to two other HDFS alternatives that are effectively no longer with us &#8212; <a href="http://code.google.com/p/kosmosfs/">KosmosFS</a> (aka CloudStore) and <a href="http://gigaom.com/2010/03/15/appistry-joins-cloudscale-storage-fray-and-brings-hadoop-with-it/">Appistry CloudIQ Storage</a>. The former was created by Kosmix (<a href="http://gigaom.com/2011/09/14/what-media-companies-can-learn-from-walmart/">since bought by @WalmartLabs</a>) and released to the open source world in 2007, but no longer has an active community. The latter was an attempt by Appistry in 2010 to get a piece of the Hadoop pie with its computational storage technology, but the company has since switched its focus from selling the technology to <a href="http://gigaom.com/2012/03/22/appistry-structure-data-2012/">providing high-performance computing services based on it</a>.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-177808p1.html">Shutterstock user Panos Karapanagiotis</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=541225&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=31995"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=31995" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=541225+because-hadoop-isnt-perfect-8-ways-to-replace-hdfs&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=541225+because-hadoop-isnt-perfect-8-ways-to-replace-hdfs&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=541225+because-hadoop-isnt-perfect-8-ways-to-replace-hdfs&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/01/how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=541225+because-hadoop-isnt-perfect-8-ways-to-replace-hdfs&utm_content=dharrisstructure">Amazon’s DynamoDB: rattling the cloud market</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/11/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/07/shutterstock_16533076.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/07/shutterstock_16533076.jpg?w=150" medium="image">
			<media:title type="html">achilles heel</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/shutterstock_16533076.jpg?w=300" medium="image">
			<media:title type="html">achilles heel</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/datastax_marketecture_a1-copy.jpg?w=300" medium="image">
			<media:title type="html">datastax_marketecture_A1 copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/stack-copy.jpg?w=300" medium="image">
			<media:title type="html">stack copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/object-based-access-methods.gif?w=300" medium="image">
			<media:title type="html">object-based-access-methods</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/gpfs.jpg?w=300" medium="image">
			<media:title type="html">gpfs</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/isilon-hadoop.jpg?w=300" medium="image">
			<media:title type="html">isilon hadoop</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/lustre.jpg?w=300" medium="image">
			<media:title type="html">lustre</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/compsol-diag3-1.jpg?w=300" medium="image">
			<media:title type="html">compsol-diag3-1</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/netapp.jpg?w=300" medium="image">
			<media:title type="html">netapp</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloud computing infrastructure: 2012 and beyond</title>
		<link>http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/</link>
		<comments>http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/#comments</comments>
		<pubDate>Wed, 20 Jun 2012 06:55:39 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/derrickharris/" rel="author">Derrick Harris</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[1010data]]></category>
		<category><![CDATA[ActiveState Software]]></category>
		<category><![CDATA[Alcatel Lucent]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[AMCC]]></category>
		<category><![CDATA[amd]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[Apache Tomcat]]></category>
		<category><![CDATA[appfabric]]></category>
		<category><![CDATA[Applied Micro]]></category>
		<category><![CDATA[arista]]></category>
		<category><![CDATA[arista-networks]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[ARMv8]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[Atheros]]></category>
		<category><![CDATA[Avaya]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Switch Networks]]></category>
		<category><![CDATA[BigML]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[British Telecom]]></category>
		<category><![CDATA[Broadcom]]></category>
		<category><![CDATA[Brocade]]></category>
		<category><![CDATA[Bungee Connect]]></category>
		<category><![CDATA[Bungee Labs]]></category>
		<category><![CDATA[BYOD]]></category>
		<category><![CDATA[Calxeda]]></category>
		<category><![CDATA[Carriers]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cavium]]></category>
		<category><![CDATA[Center for Internet Security]]></category>
		<category><![CDATA[CenturyLink]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Chunghwa Telecom]]></category>
		<category><![CDATA[CIS]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Clickable]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloud Foundry]]></category>
		<category><![CDATA[CloudBand]]></category>
		<category><![CDATA[CloudBand Management System]]></category>
		<category><![CDATA[CloudBand Node]]></category>
		<category><![CDATA[CloudBlocks]]></category>
		<category><![CDATA[Cloudscaling]]></category>
		<category><![CDATA[communication service provider]]></category>
		<category><![CDATA[CSP]]></category>
		<category><![CDATA[Data Integrator]]></category>
		<category><![CDATA[Datahero]]></category>
		<category><![CDATA[DataPop]]></category>
		<category><![CDATA[DataRush]]></category>
		<category><![CDATA[Defense Information Systems Agency]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[disa]]></category>
		<category><![CDATA[Distributed Virtual Switch]]></category>
		<category><![CDATA[dreamhost]]></category>
		<category><![CDATA[Dropbox]]></category>
		<category><![CDATA[DVS]]></category>
		<category><![CDATA[Easy Virtual Network]]></category>
		<category><![CDATA[ebay]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Elastic Beanstalk]]></category>
		<category><![CDATA[Elastic Cloud Compute]]></category>
		<category><![CDATA[embrane]]></category>
		<category><![CDATA[Engine Yard]]></category>
		<category><![CDATA[extreme-networks]]></category>
		<category><![CDATA[F5]]></category>
		<category><![CDATA[Fidelity]]></category>
		<category><![CDATA[FinFET]]></category>
		<category><![CDATA[FISMA]]></category>
		<category><![CDATA[Floodlight]]></category>
		<category><![CDATA[force-com]]></category>
		<category><![CDATA[Freescale]]></category>
		<category><![CDATA[Fulcrum]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google app engine]]></category>
		<category><![CDATA[Google BigQuery]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hardware blueprints]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[Heroku]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[IBM PureSystems]]></category>
		<category><![CDATA[Identity Management]]></category>
		<category><![CDATA[Imperva]]></category>
		<category><![CDATA[Infogr.am]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[Insieme]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Intel Atom]]></category>
		<category><![CDATA[Internap]]></category>
		<category><![CDATA[ISO]]></category>
		<category><![CDATA[Itanium]]></category>
		<category><![CDATA[ITAR]]></category>
		<category><![CDATA[juniper]]></category>
		<category><![CDATA[KDDI CORPORATION]]></category>
		<category><![CDATA[KT Corporation]]></category>
		<category><![CDATA[KVM]]></category>
		<category><![CDATA[LAMP]]></category>
		<category><![CDATA[Limelight Networks]]></category>
		<category><![CDATA[LineRate]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Loggly]]></category>
		<category><![CDATA[LSI]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[Marvell]]></category>
		<category><![CDATA[Mercedes-Benz]]></category>
		<category><![CDATA[Microchip]]></category>
		<category><![CDATA[microprocessor chips]]></category>
		<category><![CDATA[microprocessors]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Microsoft Windows Azure]]></category>
		<category><![CDATA[microsoft-windows]]></category>
		<category><![CDATA[MIPS]]></category>
		<category><![CDATA[MIPS Technologies]]></category>
		<category><![CDATA[MPLS]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[NAT]]></category>
		<category><![CDATA[national institute of standards and technology]]></category>
		<category><![CDATA[NEC]]></category>
		<category><![CDATA[Net]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[Nicera]]></category>
		<category><![CDATA[nicira]]></category>
		<category><![CDATA[NIST]]></category>
		<category><![CDATA[NTT]]></category>
		<category><![CDATA[NVGRE]]></category>
		<category><![CDATA[ONE]]></category>
		<category><![CDATA[Open Cloud OS]]></category>
		<category><![CDATA[open compute project]]></category>
		<category><![CDATA[open data center alliance]]></category>
		<category><![CDATA[Open Network Environment]]></category>
		<category><![CDATA[OpenFlow]]></category>
		<category><![CDATA[OpenShift]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[opex]]></category>
		<category><![CDATA[opteron]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[orange]]></category>
		<category><![CDATA[OTT]]></category>
		<category><![CDATA[over the top]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Papertrail]]></category>
		<category><![CDATA[parallel architectures]]></category>
		<category><![CDATA[Parse.ly]]></category>
		<category><![CDATA[Pervasive Software]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[Plexxi]]></category>
		<category><![CDATA[PMC-Sierra]]></category>
		<category><![CDATA[PowerPC]]></category>
		<category><![CDATA[profitero]]></category>
		<category><![CDATA[ProgrammableFlow Controller]]></category>
		<category><![CDATA[PureSystems]]></category>
		<category><![CDATA[QFabric]]></category>
		<category><![CDATA[Qualcomm]]></category>
		<category><![CDATA[Quantum]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[Radware]]></category>
		<category><![CDATA[RC3]]></category>
		<category><![CDATA[Red Hat]]></category>
		<category><![CDATA[Red Hat Enterprise Linux]]></category>
		<category><![CDATA[Regulatory Compliant Cloud Computing]]></category>
		<category><![CDATA[RushAnalyzer]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[sas-70]]></category>
		<category><![CDATA[Savvis]]></category>
		<category><![CDATA[SCAP]]></category>
		<category><![CDATA[SDN]]></category>
		<category><![CDATA[SeaMicro]]></category>
		<category><![CDATA[Security Content Automation Protocol]]></category>
		<category><![CDATA[security information and event management]]></category>
		<category><![CDATA[semantic analysis]]></category>
		<category><![CDATA[service-level-agreement]]></category>
		<category><![CDATA[SFR.]]></category>
		<category><![CDATA[SIEM]]></category>
		<category><![CDATA[SingTel]]></category>
		<category><![CDATA[SLA]]></category>
		<category><![CDATA[SmartCamp]]></category>
		<category><![CDATA[SNA]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[software defined networking]]></category>
		<category><![CDATA[SPARC]]></category>
		<category><![CDATA[splunk]]></category>
		<category><![CDATA[Splunk Storm]]></category>
		<category><![CDATA[sql azure]]></category>
		<category><![CDATA[StrongAuth]]></category>
		<category><![CDATA[Sumo Logic]]></category>
		<category><![CDATA[Sun Microsystems]]></category>
		<category><![CDATA[Talari Networks]]></category>
		<category><![CDATA[Telcos]]></category>
		<category><![CDATA[Telstra]]></category>
		<category><![CDATA[Terremark]]></category>
		<category><![CDATA[Texas Instruments]]></category>
		<category><![CDATA[Tilera]]></category>
		<category><![CDATA[Transtelco]]></category>
		<category><![CDATA[Tri-Gate]]></category>
		<category><![CDATA[Trinity Ventures]]></category>
		<category><![CDATA[vCenter Configuration Manager]]></category>
		<category><![CDATA[VCM]]></category>
		<category><![CDATA[Verizon]]></category>
		<category><![CDATA[virtual machine]]></category>
		<category><![CDATA[virtual network]]></category>
		<category><![CDATA[virtualbox]]></category>
		<category><![CDATA[Visual.ly]]></category>
		<category><![CDATA[Visual.ly Create]]></category>
		<category><![CDATA[VLANs]]></category>
		<category><![CDATA[vm]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[VPN]]></category>
		<category><![CDATA[VRF-Lite]]></category>
		<category><![CDATA[vShield]]></category>
		<category><![CDATA[vShield App with Data Security]]></category>
		<category><![CDATA[vShield Edge]]></category>
		<category><![CDATA[vsphere]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[Windows Azure]]></category>
		<category><![CDATA[Windows Azure AppFabric]]></category>
		<category><![CDATA[WinMagic]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[Xen]]></category>
		<category><![CDATA[Xeon chips]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=111141</guid>
		<description><![CDATA[Discussions about the cloud now involve more than just the IT department. New developments in hardware architectures, more-energy-efficient data centers, regulatory concerns and simplifying analytics are all discussions currently circling through the industry. Here's what to consider when thinking about your business in the cloud. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=534343&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Cloud computing continues to change and shape the technology industry, and these days discussions are about more than simply reorganizing the IT department. New developments in chip and hardware architectures, finding greener data centers, regulatory concerns and simplifying data analytics are all discussions currently circling through the industry. For this report, GigaOM Pro has gathered six of its analysts to discuss these topics and others in current cloud market. Here we present several areas to consider when thinking about your business in the cloud. </p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=534343&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=205122"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=205122" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=534343+cloud-computing-infrastructure-2012-and-beyond&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=534343+cloud-computing-infrastructure-2012-and-beyond&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=534343+cloud-computing-infrastructure-2012-and-beyond&utm_content=gigaedit">Infrastructure Q2: Big data and PaaS gain more momentum</a></li><li><a href="http://pro.gigaom.com/2010/07/infrastructure-overview-q2-2010/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=534343+cloud-computing-infrastructure-2012-and-beyond&utm_content=gigaedit">Infrastructure Overview, Q2 2010</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2009/04/gigaompromasterimagecloud.jpg?w=150" medium="image">
			<media:title type="html">gigaompromasterimagecloud</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>How Facebook keeps 100 petabytes of Hadoop data online</title>
		<link>http://gigaom.com/2012/06/13/how-facebook-keeps-100-petabytes-of-hadoop-data-online/</link>
		<comments>http://gigaom.com/2012/06/13/how-facebook-keeps-100-petabytes-of-hadoop-data-online/#comments</comments>
		<pubDate>Wed, 13 Jun 2012 18:14:28 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=532105</guid>
		<description><![CDATA[It's no secret that Facebook stores a lot of data in Hadoop, but how it keeps that data available whenever it needs it isn't necessarily common knowledge. Today at the Hadoop Summit Facebook Engineer Andrew Ryan highlighted that solution, which Facebook calls AvatarNode. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=532105&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s no secret that Facebook stores a lot of data &#8212; 100 petabytes, in fact &#8212; in Hadoop, but how it keeps that data available whenever it needs it isn&#8217;t necessarily common knowledge. Today at the Hadoop Summit, however, Facebook Engineer Andrew Ryan highlighted that solution, which Facebook calls AvatarNode. (I&#8217;m at Hadoop Summit, but didn&#8217;t attend Ryan&#8217;s talk; thankfully, he also <a href="https://www.facebook.com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920">summarized it in a blog post</a>.)</p>
<p>For those unfamiliar with the availability problem Facebook solved with AvatarNode, here&#8217;s the 10,000-foot explanation: The NameNode service in Hadoop&#8217;s architecture handles all metadata operations with the Hadoop Distributed File System, but it also just runs on a single node. If that node goes down, so does, for all intents and purposes, Hadoop because nothing that relies on HDFS will run properly.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/06/avatarnode.jpg"><img  title="avatarnode" src="http://gigaom2.files.wordpress.com/2012/06/avatarnode.jpg?w=300&#038;h=222" alt="" width="300" height="222" class="alignleft size-medium wp-image-532134" /></a>As Ryan explains, Facebook began building AvatarNode about two years ago (hence its James Cameron-inspired name) and it&#8217;s now in production. Put simply, AvatarNode replaces the NameNode with a two-node architecture in which one acts as a standby version if the other goes down. Currently, the failover process is manual but, Ryan writes, &#8220;we&#8217;re working to improve AvatarNode further and integrate it with a general high-availability framework that will permit unattended, automated, and safe failover.&#8221;</p>
<p>AvatarNode isn&#8217;t a panacea for Hadoop availability, however. Ryan notes that only 10 percent of Facebook&#8217;s unplanned downtime would have been preventable with AvatarNode in place, but the architecture will allow Facebook to eliminate an estimated 50 percent of future planned downtime.</p>
<p>Facebook isn&#8217;t the only company to solve this problem, by the way. Appistry (which has since changed its business focus) <a href="http://gigaom.com/2010/03/15/appistry-joins-cloudscale-storage-fray-and-brings-hadoop-with-it/">released a fully distributed file system</a> a couple years ago, and MapR&#8217;s Hadoop distribution also <a href="http://gigaom.com/cloud/startup-mapr-underpins-emcs-hadoop-effort/">provides a highly available file system</a>. In Apache Hadoop version 2.0, which <a href="http://www.cloudera.com/company/press-center/releases/cloudera-introduces-fourth-generation-of-its-big-data-platform-to-drive-ease-of-use-integration-and-adoption-of-apache-hadoop-for-the-enterprise/">underpins the latest version of Cloudera&#8217;s distribution</a>, the NameNode is also eliminated as a single point of failure.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=532105&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=787455"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=787455" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532105+how-facebook-keeps-100-petabytes-of-hadoop-data-online&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532105+how-facebook-keeps-100-petabytes-of-hadoop-data-online&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532105+how-facebook-keeps-100-petabytes-of-hadoop-data-online&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=532105+how-facebook-keeps-100-petabytes-of-hadoop-data-online&utm_content=dharrisstructure">How search can unlock the power of big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/06/13/how-facebook-keeps-100-petabytes-of-hadoop-data-online/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/avatarnode.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/avatarnode.jpg?w=150" medium="image">
			<media:title type="html">avatarnode</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/avatarnode.jpg?w=300" medium="image">
			<media:title type="html">avatarnode</media:title>
		</media:content>
	</item>
		<item>
		<title>2012: The Hadoop infrastructure market booms</title>
		<link>http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/</link>
		<comments>http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 19:22:32 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/jomaitland/" rel="author">Jo Maitland</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Adaptive Computing]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[apnatek]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[axceleon]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[bioteam]]></category>
		<category><![CDATA[BusinessObjects]]></category>
		<category><![CDATA[cascadeo]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[clustercorp]]></category>
		<category><![CDATA[Couchbase]]></category>
		<category><![CDATA[Cycle Computing]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[data scientists]]></category>
		<category><![CDATA[data storage]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[data-security]]></category>
		<category><![CDATA[Datameer]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[db2]]></category>
		<category><![CDATA[elastic-mapreduce]]></category>
		<category><![CDATA[enterprise IT]]></category>
		<category><![CDATA[Foursquare]]></category>
		<category><![CDATA[Fujitsu]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hadoop-stack]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[hp-vertica]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[informatica]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[jaspersoft]]></category>
		<category><![CDATA[karmasphere]]></category>
		<category><![CDATA[legacy-systems]]></category>
		<category><![CDATA[lexisnexis]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[microstrategy]]></category>
		<category><![CDATA[namenode-file-system]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[nube-technologies]]></category>
		<category><![CDATA[oozie]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[pentaho]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[platform-computing]]></category>
		<category><![CDATA[Quantivo]]></category>
		<category><![CDATA[quest]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[RainStor]]></category>
		<category><![CDATA[razorfish]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[splunk]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[stack-iq]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[tco]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[the-apache-foundation]]></category>
		<category><![CDATA[the-mathworks]]></category>
		<category><![CDATA[think-big-analytics]]></category>
		<category><![CDATA[TicketMaster]]></category>
		<category><![CDATA[total-cost-of-ownership]]></category>
		<category><![CDATA[univa-ud]]></category>
		<category><![CDATA[unstructured data]]></category>
		<category><![CDATA[VoltDB]]></category>
		<category><![CDATA[Wipro]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[yelp]]></category>
		<category><![CDATA[zettaset]]></category>
		<category><![CDATA[zookeeper]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=105677</guid>
		<description><![CDATA[There are now more than half a dozen commercial Hadoop distributions in the market, and almost every enterprise with big data challenges is tinkering with the Apache Foundation-licensed software. A new report examines the key disruptive trends shaping the Hadoop platform market.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=514890&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>For years, technologists have been promising software that will make it easier and cheaper to analyze vast amounts of data in order to revolutionize business. More than one solution exists, but today Hadoop is fast becoming the most talked about name in enterprises. There are now more than half a dozen commercial Hadoop distributions in the market, and almost every enterprise with big data challenges is tinkering with the Apache Foundation–licensed software. This report examines the key disruptive trends shaping the Hadoop platform market, from integration with legacy systems to ensuring data security, and where companies like Cloudera, IBM, Hortonworks and others will position themselves to gain share and increase revenue.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=514890&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=584218"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=584218" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=514890+sector-roadmap-hadoop-platforms-2012&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=514890+sector-roadmap-hadoop-platforms-2012&utm_content=gigaedit">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=514890+sector-roadmap-hadoop-platforms-2012&utm_content=gigaedit">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=514890+sector-roadmap-hadoop-platforms-2012&utm_content=gigaedit">Infrastructure Q2: Big data and PaaS gain more momentum</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/04/elephant.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/04/elephant.jpg?w=150" medium="image">
			<media:title type="html">elephant</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
		<item>
		<title>A near-term outlook for big data</title>
		<link>http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/</link>
		<comments>http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 06:55:20 +0000</pubDate>
		<dc:creator>Krish</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[33across]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[AOL]]></category>
		<category><![CDATA[Apache Foundation]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[apixio]]></category>
		<category><![CDATA[AppFog]]></category>
		<category><![CDATA[AstraZeneca]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[big-data-outsourcing]]></category>
		<category><![CDATA[BloomReach]]></category>
		<category><![CDATA[Blue Button]]></category>
		<category><![CDATA[Bristol-Myers Squibb]]></category>
		<category><![CDATA[BYD]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[CBRE Group]]></category>
		<category><![CDATA[cdata-quality]]></category>
		<category><![CDATA[Cetas]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloudant]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Comcast]]></category>
		<category><![CDATA[connected devices]]></category>
		<category><![CDATA[Consert]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[data processing]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data scientists]]></category>
		<category><![CDATA[data storage]]></category>
		<category><![CDATA[data-analytics]]></category>
		<category><![CDATA[data-as-a-service]]></category>
		<category><![CDATA[data-governance]]></category>
		<category><![CDATA[data-markets]]></category>
		<category><![CDATA[data-obesity]]></category>
		<category><![CDATA[data-quality]]></category>
		<category><![CDATA[data-quality-dimensions]]></category>
		<category><![CDATA[data-security]]></category>
		<category><![CDATA[DataFlux]]></category>
		<category><![CDATA[DataStax]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[DuPont]]></category>
		<category><![CDATA[E-ZPass]]></category>
		<category><![CDATA[EcoFactor]]></category>
		<category><![CDATA[Ecologic Analytics]]></category>
		<category><![CDATA[Electronic Medical Records]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[eMeter]]></category>
		<category><![CDATA[emrs]]></category>
		<category><![CDATA[ENBALA Power Networks]]></category>
		<category><![CDATA[energy-internet]]></category>
		<category><![CDATA[Enterprise Mobility]]></category>
		<category><![CDATA[enterprise-control-language]]></category>
		<category><![CDATA[enterprises]]></category>
		<category><![CDATA[Explorys]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Forbes]]></category>
		<category><![CDATA[Geisinger Health Systems]]></category>
		<category><![CDATA[ginger-io]]></category>
		<category><![CDATA[Global Pulse]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Greenplum]]></category>
		<category><![CDATA[grid storage]]></category>
		<category><![CDATA[GridMobility]]></category>
		<category><![CDATA[GroundedPower]]></category>
		<category><![CDATA[Group Health Cooperative]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hadoop-stack]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[health care]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[Honeywell]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[hpcc]]></category>
		<category><![CDATA[Humedica]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[ibm-watson]]></category>
		<category><![CDATA[IDC]]></category>
		<category><![CDATA[impetus]]></category>
		<category><![CDATA[infochimps]]></category>
		<category><![CDATA[informatica]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[intelligent-applications]]></category>
		<category><![CDATA[Intermountain Healthcare]]></category>
		<category><![CDATA[jeopardy]]></category>
		<category><![CDATA[kaiser-permanente]]></category>
		<category><![CDATA[Landis+Gyr]]></category>
		<category><![CDATA[lexisnexis]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[logicworks]]></category>
		<category><![CDATA[M2M]]></category>
		<category><![CDATA[machine-to-machine]]></category>
		<category><![CDATA[MapR Technologies]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[mayo clinic]]></category>
		<category><![CDATA[McKinsey]]></category>
		<category><![CDATA[metascale]]></category>
		<category><![CDATA[meter-data-management-systems]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mobile carriers]]></category>
		<category><![CDATA[mobile health]]></category>
		<category><![CDATA[mu-sigma]]></category>
		<category><![CDATA[National Cancer Institute]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Nuance Communications]]></category>
		<category><![CDATA[nuevora]]></category>
		<category><![CDATA[oozie]]></category>
		<category><![CDATA[Opera Solutions]]></category>
		<category><![CDATA[OPower]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Parse.ly]]></category>
		<category><![CDATA[patientslikeme]]></category>
		<category><![CDATA[Pervasive]]></category>
		<category><![CDATA[Pfizer]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[private clouds]]></category>
		<category><![CDATA[profitero]]></category>
		<category><![CDATA[Public Clouds]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[Recurve]]></category>
		<category><![CDATA[Red Hat]]></category>
		<category><![CDATA[redgiant-analytics]]></category>
		<category><![CDATA[Regulated Industries]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[scale-unlimited]]></category>
		<category><![CDATA[scienergy]]></category>
		<category><![CDATA[Sears Holding Corporation]]></category>
		<category><![CDATA[service providers]]></category>
		<category><![CDATA[SGI]]></category>
		<category><![CDATA[Siemens]]></category>
		<category><![CDATA[Silver Spring Networks]]></category>
		<category><![CDATA[Skytree]]></category>
		<category><![CDATA[Smart Grid]]></category>
		<category><![CDATA[smart meters]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[Sourcefire]]></category>
		<category><![CDATA[Sprint]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[structured data]]></category>
		<category><![CDATA[Systemcon]]></category>
		<category><![CDATA[T-Mobile]]></category>
		<category><![CDATA[talend]]></category>
		<category><![CDATA[Target]]></category>
		<category><![CDATA[targeted-advertising]]></category>
		<category><![CDATA[Tendril]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[The Internet of Things]]></category>
		<category><![CDATA[think-big-analytics]]></category>
		<category><![CDATA[Toshiba]]></category>
		<category><![CDATA[Trillium]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[unstructured data]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[Verizon Wireless]]></category>
		<category><![CDATA[VoltDB]]></category>
		<category><![CDATA[Wal-Mart]]></category>
		<category><![CDATA[WellPoint]]></category>
		<category><![CDATA[whirlpool]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[yelp]]></category>
		<category><![CDATA[zettaset]]></category>
		<category><![CDATA[zookeeper]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=101786</guid>
		<description><![CDATA[Big data now touches everything from enterprises to smart-meter startups, while Hadoop is fast becoming the leading tool to analyze that data, and debates around privacy abound. GigaOM Pro analysts offer insights on what to consider when it comes to big data decisions for your business.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=501896&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Big data now touches everything from enterprises and hospitals to smart-meter startups and connected devices in the home. Hadoop, meanwhile, is fast becoming the leading tool to analyze that data, and there is the ever-lingering question of privacy and how we, the technology industry, are responsible for teaching ethical ways to collect and regulate our data. This report, composed of eight different sections each written by a GigaOM Pro analyst, offers insights on what to consider when it comes to big data decisions for your business.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=501896&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=761426"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=761426" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/why-service-providers-matter-for-the-future-of-big-data/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">Why service providers matter for the future of big data</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">Infrastructure Q2: Big data and PaaS gain more momentum</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=501896+a-near-term-outlook-for-big-data&utm_content=iamkrishnan">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/03/datacenter.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/03/datacenter.jpg?w=150" medium="image">
			<media:title type="html">datacenter</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/04f327f032df043846baa7474b8e6aff?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">Krish</media:title>
		</media:content>
	</item>
		<item>
		<title>What it really means when someone says &#8216;Hadoop&#8217;</title>
		<link>http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/</link>
		<comments>http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 20:12:12 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Datameer]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Hadapt]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[karmasphere]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Platfora]]></category>
		<category><![CDATA[WibiData]]></category>
		<category><![CDATA[zettaset]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=481182</guid>
		<description><![CDATA[Hadoop features front and center in the discussion of how to implement a big data strategy, one of the biggest trends in IT. There’s just one problem that keeps cropping up: many people don’t seem to know exactly what it means when somebody says “Hadoop.”<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=481182&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2011/10/hadoop1.jpg"><img title="hadoop" src="http://gigaom2.files.wordpress.com/2011/10/hadoop1.jpg?w=708" alt=""   class="alignleft size-full wp-image-426524"></a>Big data is among the hottest trends in IT right now, and Hadoop stands front and center in the discussion of how to implement a big data strategy. There’s just one problem that keeps cropping up: many people don’t seem to know exactly what it means when somebody says “Hadoop.”</p>
<p>The problem surfaced again Monday in the form of complaints over Forrester’s new report titled <a href="http://www.forrester.com/rb/Research/wave%26trade%3B_enterprise_hadoop_solutions%2C_q1_2012/q/id/60755/t/2?src=RSS_2&amp;cm_mmc=Forrester-_-RSS-_-Document-_-6">“Enterprise Hadoop Solution, Q1 2012.”</a><em> InformationWeek </em><a href="http://informationweek.com/news/software/info_management/232600283">spoke with a few vendors</a> that didn’t like how their products were assessed, and database industry analyst Curt Monash <a href="http://www.dbms2.com/2012/02/06/comments-on-the-2012-forrester-wave-enterprise-hadoop-solutions">says the report “compares apples, peaches, almonds, and peanuts.”</a> I thought the same thing when I saw a copy of the report last week. They all focus on Hadoop, but Hortonworks is not Datameer is not HStreaming.</p>
<p>Allow me to explain. Hopefully, this provides a foundation for parsing what people talk about when they talk about Hadoop, and for differentiating one type of product from another. (And you can learn even more about Hadoop and how it’s used at our <a href="http://event.gigaom.com/structuredata/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=481182+what-it-really-means-when-someone-says-hadoop&amp;utm_content=dharrisstructure">Structure: Data</a> conference taking place next month in New York City.)</p>
<h2>What Hadoop is</h2>
<p>I went into this in more detail in a <a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=481182+what-it-really-means-when-someone-says-hadoop&amp;utm_content=dharrisstructure">GigaOM Pro report published last March</a> (<strong>sub req’d</strong>), but the long and short is that Hadoop is, at its core, an <a href="http://hadoop.apache.org/">Apache Software Foundation project</a> consisting of two primary subprojects — <a href="http://hadoop.apache.org/mapreduce/">Hadoop MapReduce</a> and the <a href="http://hadoop.apache.org/hdfs/">Hadoop Distributed File System</a>. MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets in relatively short order. HDFS is the distributed file system that lets Hadoop scale across commodity servers and, importantly, store data on the compute nodes in order to boost performance (and potentially save money). These are the two must-have components for any Hadoop distribution.</p>
<p>There are also a number of Apache projects related to Hadoop, often built atop either Hadoop MapReduce or HDFS. These include — but are not limited to — <a href="http://hive.apache.org/">Hive</a> and <a href="http://pig.apache.org/">Pig</a>, two SQL-like query languages to provide data-warehouse-like capabilities to a Hadoop cluster, and <a href="http://hbase.apache.org/">HBase</a>, a NoSQL database that leverages HDFS as its distributed storage engine.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/02/hadoop-projects.jpg"><img title="hadoop projects" src="http://gigaom2.files.wordpress.com/2012/02/hadoop-projects.jpg?w=604&#038;h=198" alt="" width="604" height="198" class="aligncenter size-large wp-image-481309"></a></p>
<h2>Hadoop distributions</h2>
<p>These are packaged software products that aim to ease deployment and management of Hadoop clusters compared with simply downloading the various Apache code bases and trying to cobble together a system. Presently, <a href="http://gigaom.com/cloud/why-cloudera-isnt-sweating-the-hadoop-competition/">Cloudera</a>, <a href="http://gigaom.com/cloud/yahoo-spinoff-shakes-up-hadoop-market-with-new-distro/">Hortonworks</a>, <a href="http://gigaom.com/cloud/battle-on-mapr-cloudera-pimp-their-version-of-hadoop/">MapR</a> and <a href="http://gigaom.com/cloud/emc-throws-lots-of-hardware-at-hadoop/">EMC</a>  all offer their own Hadoop distributions. Although they’re all unique — sometimes very unique, as with MapR’s proprietary file system — they all package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a way that in theory makes them integrate more naturally, and to run both smoothly and securely.</p>
<p>Many Hadoop distributions integrate with various data warehouses, databases and other data-management products, with the goal of moving data between Hadoop clusters and other environments so each might process or query data stored in the other.</p>
<h2>Hadoop management software</h2>
<p>Just as the wording implies, Hadoop management software is designed to make it easier to manage and troubleshoot a Hadoop cluster. Such products are usually sold or offered by companies peddling Hadoop distributions, because even when commercially packaged, Hadoop is still a complex architecture and somewhat foreign to most IT personnel and products. However, third parties such as <a href="http://gigaom.com/cloud/platform-computing-extends-hpc-reach-into-mapreduce/">Platform Computing</a> (now <a href="http://gigaom.com/cloud/ibm-eyes-big-data-at-big-banks-with-platform-buy/">part of IBM</a>) and <a href="http://gigaom.com/cloud/zettaset-raises-3m-for-the-consumerization-of-big-data/">Zettaset</a> also sell software for managing Hadoop clusters, and their products are typically agnostic as to what distributions they support.</p>
<p>But distributions and management software are all about the infrastructure and the platform. Anyone actually wanting to use Hadoop still needs to know how to write applications that leverage the underlying architecture.</p>
<h2>Hadoop application software (or, products that use Hadoop)</h2>
<p>The Hadoop ecosystem gets really complex when we start looking at products that exist to help developers write Hadoop applications or otherwise analyze data stored within Hadoop in a manner other than writing traditional MapReduce jobs. These range from abstraction layers such as <a href="http://karmasphere.com/index.php">Karmasphere Analyst</a> or <a href="http://gigaom.com/cloud/ibms-hadoop-effort-grows-from-project-to-product/">IBM Infosphere BigInsights</a>, to <a href="http://gigaom.com/cloud/hadapt-raises-9-5m-for-hadoop-data-warehouse/">Hadapt</a>, which offers a single-platform product fusing a SQL data warehouse with a Hadoop cluster, to <a href="http://www.hstreaming.com/">HStreaming</a>, which promises real-time processing and analytics.</p>
<p>The one common thing among all these products, however, is that they are not Hadoop distributions, but sit atop platform software from Hortonworks, EMC or whomever. Some products that get thrown into the Hadoop fray, such as <a href="http://outerthought.org/site/products/lily.html">Outerthought Lily</a> or <a href="http://drawntoscale.com/how_it_works.html">Drawn to Scale Spire</a>, are essentially scale-out databases built atop HBase (which itself is a separate project built atop HDFS). The image below, from Karmasphere, gives a particularly clear map of how a Hadoop environment might look.</p>
<p><a href="http://gigaom2.files.wordpress.com/2011/06/hadoopdatafabric-ks.jpeg"><img title="HadoopDataFabric-KS" src="http://gigaom2.files.wordpress.com/2011/06/hadoopdatafabric-ks.jpeg?w=604&#038;h=379" alt="" width="604" height="379" class="aligncenter size-large wp-image-369496"></a></p>
<p>The applications and analytics space is probably <a href="http://gigaom.com/cloud/5-low-profile-startups-that-could-change-the-face-of-big-data/">where we’ll see the biggest influx of new companies</a>, as writing Hadoop applications is still tough, but it’s also how companies will actually start experiencing direct business benefits. In fact, it’s these type of higher-level products that are the focal point of <a href="http://gigaom.com/cloud/accel-forms-100m-fund-to-feed-big-data-apps/">Accel Partners’ new big data fund</a>.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=481182&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=282252"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=282252" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/sector-roadmap-hadoop-platforms-2012/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=481182+what-it-really-means-when-someone-says-hadoop&utm_content=dharrisstructure">2012: The Hadoop infrastructure market booms</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/02/06/what-it-really-means-when-someone-says-hadoop/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2011/10/hadoop-e1319488918182.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2011/10/hadoop-e1319488918182.jpg?w=150" medium="image">
			<media:title type="html">hadoop</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/10/hadoop1.jpg" medium="image">
			<media:title type="html">hadoop</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/02/hadoop-projects.jpg?w=604" medium="image">
			<media:title type="html">hadoop projects</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2011/06/hadoopdatafabric-ks.jpeg?w=604" medium="image">
			<media:title type="html">HadoopDataFabric-KS</media:title>
		</media:content>
	</item>
		<item>
		<title>Amazon’s DynamoDB: rattling the cloud market</title>
		<link>http://pro.gigaom.com/2012/01/how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets/</link>
		<comments>http://pro.gigaom.com/2012/01/how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 17:33:13 +0000</pubDate>
		<dc:creator><a href="http://pro.gigaom.com/members/jomaitland/" rel="author">Jo Maitland</a></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[apache-hadoop]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[Azure]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Big Data Appliance]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Cloud Foundry]]></category>
		<category><![CDATA[Cloudant]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[daily deals]]></category>
		<category><![CDATA[database.com]]></category>
		<category><![CDATA[Dryad]]></category>
		<category><![CDATA[DynamoDB]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[enterprise IT]]></category>
		<category><![CDATA[etl]]></category>
		<category><![CDATA[eventual-consistency]]></category>
		<category><![CDATA[extract-transform-load]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Groupon]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop Distributed File System]]></category>
		<category><![CDATA[HDFS]]></category>
		<category><![CDATA[Heroku]]></category>
		<category><![CDATA[Hortonworks]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[live streaming]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[microsoft-azure]]></category>
		<category><![CDATA[mongohq]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OpenShift]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[rdbms]]></category>
		<category><![CDATA[Red Hat]]></category>
		<category><![CDATA[relational-database-management-system]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[Salesforce.com]]></category>
		<category><![CDATA[SimpleDB]]></category>
		<category><![CDATA[Social Commerce]]></category>
		<category><![CDATA[solid-state drives]]></category>
		<category><![CDATA[SolidFire]]></category>
		<category><![CDATA[SSD]]></category>
		<category><![CDATA[ssds]]></category>
		<category><![CDATA[streaming music]]></category>
		<category><![CDATA[streaming video]]></category>
		<category><![CDATA[Violin Memory]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[zcloud]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=94832</guid>
		<description><![CDATA[DynamoDB, AWS' latest effort to rock the technology establishment, has many implications for other players in the big data and cloud computing markets. A new GigaOM Pro research note examines just who is affected, and how.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=473802&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The latest AWS offering to rock the technology establishment is DynamoDB, a NoSQL database service that puts the power of NoSQL in the hands of every developer. This research note analyzes the multiple ways in which Amazon’s announcement has disrupted the big data and cloud computing market and what that means for other companies and offerings in the space — from the startups selling Hadoop distributions to public cloud providers like Rackspace and Microsoft, which will have to scramble to keep up and differentiate. Additional companies mentioned in this report include Cloudant, Heroku and VMware. For a full list of companies, and to read the full report, sign up for a free trial.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=473802&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=78781"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=78781" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=473802+how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets&utm_content=gigaedit">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=473802+how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets&utm_content=gigaedit">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=473802+how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets&utm_content=gigaedit">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=473802+how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets&utm_content=gigaedit">Cloud computing infrastructure: 2012 and beyond</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/01/how-amazons-dynamodb-is-rattling-the-big-data-and-cloud-markets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://pro.gigaom.com/files/2012/01/boats.jpg?w=150" />
		<media:content url="http://pro.gigaom.com/files/2012/01/boats.jpg?w=150" medium="image">
			<media:title type="html">boats</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4f3860069d181dbeeb398304f5940a9e?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaedit</media:title>
		</media:content>
	</item>
	</channel>
</rss>
