<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; scalability</title>
	<atom:link href="http://gigaom.com/tag/scalability/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Tue, 21 May 2013 07:34:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; scalability</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>Feedly survives the outages from the post-Google Reader rush, adding users, feeds and maybe revenue</title>
		<link>http://gigaom.com/2013/04/19/feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue/</link>
		<comments>http://gigaom.com/2013/04/19/feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue/#comments</comments>
		<pubDate>Fri, 19 Apr 2013 20:53:24 +0000</pubDate>
		<dc:creator>Jordan Novet</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[RSS feeds]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[feedly]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=632779</guid>
		<description><![CDATA[Feedly has faced two outages since adding millions of users in the wake of the announcement that Google will retire its Google Reader service. Now Feedly is accelerating its monetization plans.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632779&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Feedly, which has emerged as one of the best replacements for Google Reader in the wake of the <a href="http://gigaom.com/2013/03/13/google-kills-google-reader-will-go-offline-on-july-1-2013/">announcement</a> that Google will abandon the RSS service, has been taking on millions of new users and at the same time steadily <a href="http://blog.feedly.com/2013/04/18/feedly-mobile-14-1-is-out-denser-and-cleaner-title-only-view-and-bug-fixes/">pushing out new features</a>. But the growth in users hasn&#8217;t been completely uneventful. </p>
<p><div id="attachment_632783" class="wp-caption alignleft" style="width: 410px"><a href="http://gigaom2.files.wordpress.com/2013/04/cyril-feedly.png"><img src="http://gigaom2.files.wordpress.com/2013/04/cyril-feedly.png?w=708" alt="Feedly Co-founder Cyril Moutran"    class="size-full wp-image-632783" /></a><p class="wp-caption-text">Feedly Co-founder Cyril Moutran</p></div> In the five weeks since Google said it would shutter Reader later this year, the Feedly site has gone down two times, co-founder Cyril Moutran told me in an interview this week. The first time came right when the <a href="http://gigaom.com/2013/03/13/chris-wetherll-google-reader/">Google Reader</a> announcement was made. There was a &#8220;huge load on our server,&#8221; Moutran said. &#8220;It just came, slammed us really, really fast. &#8230; What broke for us was really bandwidth. Basically, just having so many users coming in, the bandwidth was just everybody was coming in, and the servers were not responding.&#8221; </p>
<p>So engineers moved static content off the Feedly servers inside a data center and, somewhat ironically, onto <a href="http://gigaom.com/2012/10/26/google-puts-app-engine-back-online/">Google App Engine</a>, which scales very nicely, Moutran said. Dynamic content stayed put on the Feedly servers, which store terabytes of data, including indexed content from the feeds users subscribe to.</p>
<p>Less than a week later, Moutran said, &#8220;we saw another really, really crazy spike.&#8221; The site went down again. Developers took a look at the code that communicates between the client and Feedly servers, and tried to make the client more efficient, thereby reducing the load hitting the servers. &#8220;Then we had to order some more hardware,&#8221; Moutran said &#8212; load balancers, to be specific.</p>
<p>That second outage came on a Monday. As it turned out, Feedly gets more traffic on Monday than on any other day, and generally speaking traffic is higher on weekdays than on weekends. Desktop traffic picks up at around 8 a.m. local time and decreases around 6 p.m. Why? Many Feedly users look to the service &#8220;not so much in a casual context but more to catch up with what&#8217;s going on with the industry,&#8221; Moutran said. People use Feedly for work, in other words. Lawyers, designers, and <a href="http://paidcontent.org/2013/03/14/google-reader-please-dont-go-i-need-you-to-do-my-job/">writers</a> are typical business users.</p>
<p>As many more users get on board &#8212; more than 3 million had joined since the Google announcement <a href="http://blog.feedly.com/2013/04/02/announcing-the-new-feedly-mobile-and-welcoming-3-million-reader-refugees/?utm_source=feedly">as of April 2</a>, on top of 4 million users active before the announcement &#8212; more feeds pile up. The number of feeds is now up to 100 million, Moutran said.</p>
<p>With many more business users and a greater variety of content, monetization is a bigger question, and Feedly feels it must accelerate its efforts in that direction. The company, which is based in Palo Alto, Calif., and has 10 employees, is now looking at how it will introduce a premium or pro version later this year. Feedly could also add a way to take revenue by providing streams of publishers&#8217; premium content inside the desktop and mobile versions of the application.</p>
<p>While plenty of people find Twitter handy for getting news, the migration of millions to Feedly shows the desire for a strong RSS reader still exists. If that desire keeps steady and if Feedly can keep adding features that interest users,  it could turn Google&#8217;s trash into Feedly&#8217;s treasure.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632779&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=748002"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=748002" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632779+feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue&utm_content=gigajordan">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/11/an-overview-of-the-software-defined-networking-market/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632779+feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue&utm_content=gigajordan">The promise of SDNs in the enterprise</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632779+feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue&utm_content=gigajordan">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/03/the-new-it-manager-part-1-trends-affecting-it-in-business/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632779+feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue&utm_content=gigajordan">The new IT manager, part 1</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/19/feedly-survives-the-outages-from-the-post-google-reader-rush-adding-users-feeds-and-maybe-revenue/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/feedly-logo-green.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/feedly-logo-green.jpg?w=150" medium="image">
			<media:title type="html">Feedly logo green</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/c00ab753df107b639e76ed4c3ab07ba7?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigajordan</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/04/cyril-feedly.png" medium="image">
			<media:title type="html">Feedly Co-founder Cyril Moutran</media:title>
		</media:content>
	</item>
		<item>
		<title>Drawn to Scale wants to make MongoDB scale like Hadoop</title>
		<link>http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/</link>
		<comments>http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 17:00:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Drawn to Scale]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=621885</guid>
		<description><![CDATA[Database startup Drawn to Scale has extended its Spire distributed data platform from SQL to MongoDB. That means users can get high performance from the latter even across hundreds of terabytes.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621885&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you love MongoDB but are tired of trying to scale it past a handful of machines and a few hundred gigabytes, database startup <a href="http://drawntoscale.com/">Drawn to Scale</a> says it has you covered. The company has <a href="http://drawntoscale.com/announcing-spire-for-mongo/">expanded the functionality of its distributed data platform from SQL to MongoDB</a>, meaning users of the popular NoSQL database can import their data to Spire and see high performance on hundreds of terabytes.</p>
<p>Drawn to Scale’s flagship product, called Spire, is a distributed data platform that’s built atop an optimized version of the Hadoop-based HBase database. HBase is what lets Spire scale cheaply and easily across. Its fully distributed index is what lets Spire read and write data at speeds that other approaches to scaling databases (e.g., sharding) can’t handle while maintaining the ability to handle rich queries.</p>
<p>To date, <a href="http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/">the company has been focused on letting users run massive SQL databases</a>, but it has finally completed a lengthy process of rewriting parts of MongoDB to work with Spire, Founder and CEO Bradford Stephens (who’ll be participating in our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=621885+drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems&amp;utm_content=dharrisstructure">Structure: Data event</a> this week in New York) told me. The company had been keeping the work under tight wraps “because we didn’t know how long it was going to take to build,” he added.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/03/spiremongo-230x300.png"><img alt="SpireMongo-230x300" src="http://gigaom2.files.wordpress.com/2013/03/spiremongo-230x300.png?w=708"   class="alignright size-full wp-image-621963"></a>“Our big vision for the market is providing people with a universal data platform,” Stephens said. After SQL — which accounts for the vast majority of databases in existence — MongoDB is a logical next step (although Spire also supports queries using Hadoop MapReduce). It’s the most-widely used NoSQL database by a longshot, but although many users love its functionality and tooling, <a href="http://gigaom.com/2012/05/29/with-42m-more-10gen-wants-to-take-mongodb-mainstream/">the database is notoriously poor at scaling</a> to meet the demands of big data or high performance.</p>
<p>“You just sort of top out once you max out the memory,” Stephens explained, adding that MongoDB often starts getting inefficient as it’s forced to scale across 50 or 10 servers. “[T]hat’s where we <em>start</em> getting efficient.”</p>
<p>Now, without changing a single line of code, he claims, MongoDB users can import their data onto Spire and start handing 200-plus terabytes with ease. Of course, he noted, this doesn’t mean MongoDB users will abandon the database entirely. It might be they keep it for running applications that don’t require it to scale beyond a single server, and then use Spire to store big data for analytical purposes.</p>
<p>Initially, Spire will just support data importation and the basic CRUD (create, read, update, delete) functions of MongoDB, Stephens said. Later this year, assuming users want it, Drawn to Scale will implement MongoDB’s native MapReduce functionality as well as its management features.</p>
<p>As data volumes and data stores continue to proliferate, though, Drawn to Scale isn’t the only startup trying to provide a one-stop shop experience. At least for analytics, Citus Data is building a Postgres-based database <a href="http://gigaom.com/2013/02/19/citusdb-today-sql-on-hadoop-tomorrow-the-world/">capable of analyzing SQL, Hadoop and MongoDB data</a>, although each data store remains external. And there’s a <a href="http://gigaom.com/2013/03/05/the-hadoop-ecosystem-the-welcome-elephant-in-the-room-infographic/">whole group of companies merging SQL and Hadoop</a> for analytic workloads that might be wise to consider supporting operational data stores such as MongoDB, as well.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621885&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=954273"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=954273" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621885+drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621885+drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621885+drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li><li><a href="http://pro.gigaom.com/2010/10/with-scalable-data-stores-around-is-nosql-a-non-starter/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621885+drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems&utm_content=dharrisstructure">With Scalable Data Stores Around, Is NoSQL a Non-Starter?</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/19/drawn-to-scale-wants-to-solve-your-mongodb-scalability-problems/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/10/shutterstock_113600470.jpg?w=150" medium="image">
			<media:title type="html">Shiny database</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/spiremongo-230x300.png" medium="image">
			<media:title type="html">SpireMongo-230x300</media:title>
		</media:content>
	</item>
		<item>
		<title>A peek inside China&#8217;s internet giants and their massive scale</title>
		<link>http://gigaom.com/2013/01/09/a-peek-inside-chinas-internet-giants-and-their-massive-scale/</link>
		<comments>http://gigaom.com/2013/01/09/a-peek-inside-chinas-internet-giants-and-their-massive-scale/#comments</comments>
		<pubDate>Wed, 09 Jan 2013 23:12:40 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[alibaba]]></category>
		<category><![CDATA[Baidu]]></category>
		<category><![CDATA[Data Centers]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[low-power servers]]></category>
		<category><![CDATA[open compute project]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[sina]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[Taobao]]></category>
		<category><![CDATA[Tencent]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[Weibo]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=600420</guid>
		<description><![CDATA[China's big four internet companies are big -- huge, in fact -- but they're not yet technological innovators like their American counterparts. However, scalability is an an issue that knows no borders, which has spurred some cross-continental cooperation. Will it also inspire a Chinese tech awakening?<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=600420&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s the thing about China: It&#8217;s very, very big. And although the Great Firewall cuts its citizens off from many popular U.S. web services, those citizens still exist. In fact, there are more of them than all the citizens of the United States and European Union combined. And they use social media and e-commerce just like the rest of us.</p>
<p>It should come as no surprise, then, that the companies serving the country&#8217;s 1.3 billion people with their social media, e-commerce and information-discovery needs are very, very big, too. Here are some statistics that demonstrate their scale.</p>
<h2 id="alibaba-group">Alibaba Group</h2>
<p><a href="http://www.taobao.com/index_global.php">Taobao,</a> the eBay-like e-commerce line of business from Chinese internet giant Alibaba Group, does a lot of business. On a single day &#8212; Nov. 11, 2011 &#8212; the company did a whopping 19 billion Yuan (or approximately $3.05 billion) in sales. According to Alibaba Group CTO and Alibaba Cloud Computing President Wang Jian, the company site surpassed the 1 trillion Yuan (about $160 billion) mark for 2012 revenue at the end of November. Alipay, the company&#8217;s version of PayPal, handles about 3 billion Yuan (about $480 million) in transactions every day.</p>
<p>By comparison, eBay posted $3.4 billion in revenue for the entire third-quarter this year. Amazon.com, with which Taobao also competes (although Alibaba also has a business-to-consumer division called Tmall), closed its third quarter with $13.8 billion in revenue.</p>
<div id="attachment_600586" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/taobao.jpg"><img  alt="The women's shoe department on Taobao" src="http://gigaom2.files.wordpress.com/2013/01/taobao.jpg?w=708&#038;h=409" width="708" height="409" class="size-large wp-image-600586" /></a><p class="wp-caption-text">The women&#8217;s shoe department on Taobao</p></div>
<p>Of course, Taobao and Alipay are just two of <a href="http://en.wikipedia.org/wiki/Alibaba_Group">Alibaba&#8217;s expansive portfolio of services</a>, which includes a much-publicized (<a href="http://online.wsj.com/article/SB10000872396390443816804578004290541336274.html">although recently reduced</a>) partnership with Yahoo.</p>
<p>That type of business means Aliaba needs a lot of servers. In a single year not too long ago, Jian told me, the company bought more servers than it had in previous five years combined. If you charted Alibaba&#8217;s server count now versus five years ago, he added, the previous number would look like zero. How big is its database? Enough to store data for more than 800 million items for sale.</p>
<h2 id="baidu">Baidu</h2>
<p>The Chinese search giant is <a href="http://www.alexa.com/topsites">ranked fifth in the Alexa internet rankings</a> (behind Facebook Google, YouTube and Yahoo), which is evidence of its popularity. All those users, I&#8217;m told, result in an annual server growth approximately equal to the previous three years combined. It has been reported that Baidu is <a href="http://slashdot.org/topic/datacenter/will-baidus-data-center-be-the-worlds-largest/">planning possibly the world&#8217;s largest data center</a> &#8212; spanning 120,000 square meters, costing $1.6 billion, housing 100,000 servers (totaling 700,000 CPUs and 3 million cores) and storing 4,000 petabytes of data.</p>
<h2 id="tencent">Tencent</h2>
<p>Somtimes compared with Facebook in the United States (although it&#8217;s actually quite different), <a href="http://www.tencent.com/en-us/index.shtml">Tencent</a> boasted more than 717 million users for its popular QQ messaging service as of September 2011. That number has surely grown. The company says its highest-ever number of concurrent users was more than 176 million, although there are often tens of millions (if not more than 100 million people) using it <a href="http://im.qq.com/online/index.shtml">at any given time</a>. An individual with some knowledge of the company&#8217;s infrastructure told me Tencent adds about 100,000 servers per year.</p>
<div id="attachment_600571" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/tencent1.jpg"><img  alt="Tencent usage at 5:49 local time on Jan. 10, 2012." src="http://gigaom2.files.wordpress.com/2013/01/tencent1.jpg?w=708&#038;h=452" width="708" height="452" class="size-large wp-image-600571" /></a><p class="wp-caption-text">Tencent usage at 5:51 local time on Jan. 10, 2012.</p></div>
<h2 id="weibo">Weibo</h2>
<p><a href="http://www.weibo.com/">Weibo</a>, the Twitter-like platform from internet new-school internet company Sina had more than 400 million users as of April 2012. That&#8217;s about twice the number Twitter claims. And the Chinese use Weibo a lot, for everything from micro-blogging to self-publishing. It might actually be a more important tool in China than Twitter is in the United States, sources told me, because while the government can censor official news outlets, it can&#8217;t possibly control the stream of information coming off Weibo. And that will mean even more growth.</p>
<h2 id="not-yet-innovators">Not (yet) innovators</h2>
<p>However, despite their sheer scale, Chinese internet companies are, by most accounts, less technologically inclined than their American counterparts. The biggest reason &#8212; one I heard time and time again &#8212; is that these companies tend to view themselves as traditional businesses rather than technology companies, and that employees often strive to work up the management ladder rather than remain career engineers. This inevitably affects R&amp;D budgets, makes companies less willing to take risks and reduces the pool of employees that really, deeply understand complex systems.</p>
<p>As an example, one might look at the server situation within China&#8217;s big four internet companies. Alibaba&#8217;s Jian told me that although his company is running all white boxes in its data centers now, it had a lot of legacy IBM gear in its data centers five years ago. I heard the same thing about Baidu. Tencent, someone told me, had 10,000 webscale servers fail in six months last year and is considering a move back to traditional boxes.</p>
<p>However, maybe these companies are coming around on innovation beyond just buying more-efficient gear. Tencent, Baidu and Alibaba, for example, are all members of the <a href="http://gigaom.com/2012/11/08/facebook-and-open-compute-want-a-biodegradable-server-chassis/">Facebook-led Open Compute Project</a> for designing webscale hardware. Tencent and Baidu actually created their own rack-design specification, called Project Scorpio, that is <a href="http://opencompute.org/2012/05/02/enabling-innovation-where-it-matters/">being merged into Open Compute&#8217;s Open Rack design</a> in 2013. They still don&#8217;t build their own servers like Google and Facebook do, preferring instead to push their custom specs on server makers, but many innovative American companies, <a href="http://gigaom.com/2012/04/06/making-the-web-more-efficient-a-thousand-servers-at-a-time/">including eBay</a>, do the same thing.</p>
<div id="attachment_600585" class="wp-caption aligncenter" style="width: 654px"><a href="http://gigaom2.files.wordpress.com/2013/01/open-rack.jpg"><img  alt="Power specs of Open Rack" src="http://gigaom2.files.wordpress.com/2013/01/open-rack.jpg?w=708"   class="size-full wp-image-600585" /></a><p class="wp-caption-text">Power specs of Open Rack</p></div>
<p>One has to assume that a closer working relationship between engineers at American and Chinese internet companies will spur even more changes in the tech culture there. Although technical talent comes relatively cheap in China, perhaps they&#8217;ll realize that highly skilled, forward-thinking engineers (and data scientists, for that matter) are something worth hanging onto and rewarding with high salaries.</p>
<p>As Facebook VP Frank Frankovsky <a href="http://www.pcworld.com/article/259972/facebook_to_test_first_open_compute_racks.html">told PCWorld in July</a> as the Open Rack designs were unveiled, &#8220;We compete with those guys, but on the infrastructure side, if we can make our infrastructure more efficient, it makes everyone that much better. Where we differentiate our business is in the service we provide to our end users.&#8221;</p>
<p>That differentiation comes from in large part from <a href="http://gigaom.com/2012/02/02/investors-and-users-beware-facebook-is-all-about-it/">an incredible investment in research and technology</a>. If they want to be considered thought leaders in their field &#8212; and if they want to expand significantly into cloud computing (as <a href="http://www.aliyun.com/">Alibaba</a> and <a href="http://sinacloud.com/">Sina</a> clearly want to do) &#8212; China&#8217;s internet companies will have to start matching their immense scale with demonstrated technological prowess.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=600420&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=241582"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=241582" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600420+a-peek-inside-chinas-internet-giants-and-their-massive-scale&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/10/flash-analysis-the-future-of-yahoo/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600420+a-peek-inside-chinas-internet-giants-and-their-massive-scale&utm_content=dharrisstructure">Flash analysis: the future of Yahoo</a></li><li><a href="http://pro.gigaom.com/2012/11/an-overview-of-the-software-defined-networking-market/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600420+a-peek-inside-chinas-internet-giants-and-their-massive-scale&utm_content=dharrisstructure">The promise of SDNs in the enterprise</a></li><li><a href="http://pro.gigaom.com/2012/02/facebooks-ipo-filing-the-opening-shot-heard-round-the-world/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=600420+a-peek-inside-chinas-internet-giants-and-their-massive-scale&utm_content=dharrisstructure">Facebook&#8217;s IPO filing: ideas and implications</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/01/09/a-peek-inside-chinas-internet-giants-and-their-massive-scale/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/01/tencent.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/01/tencent.jpg?w=150" medium="image">
			<media:title type="html">tencent</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/taobao.jpg?w=708" medium="image">
			<media:title type="html">The women&#039;s shoe department on Taobao</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/tencent1.jpg?w=708" medium="image">
			<media:title type="html">Tencent usage at 5:49 local time on Jan. 10, 2012.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/open-rack.jpg" medium="image">
			<media:title type="html">Power specs of Open Rack</media:title>
		</media:content>
	</item>
		<item>
		<title>How direct-access solutions can speed up cloud adoption</title>
		<link>http://pro.gigaom.com/2012/12/how-direct-access-solutions-can-speed-up-cloud-adoption/</link>
		<comments>http://pro.gigaom.com/2012/12/how-direct-access-solutions-can-speed-up-cloud-adoption/#comments</comments>
		<pubDate>Mon, 31 Dec 2012 07:55:19 +0000</pubDate>
		<dc:creator>Larry Carvalho</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[AT&T]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[cloud security]]></category>
		<category><![CDATA[colocation]]></category>
		<category><![CDATA[CRM]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Data Centers]]></category>
		<category><![CDATA[direct access solutions]]></category>
		<category><![CDATA[enterprise IT]]></category>
		<category><![CDATA[Equinix]]></category>
		<category><![CDATA[Flipboard]]></category>
		<category><![CDATA[Hootsuite]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[infrastructure as a service]]></category>
		<category><![CDATA[LivingSocial]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Pinterest]]></category>
		<category><![CDATA[Platform as a Service]]></category>
		<category><![CDATA[Regulated Industries]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[service-level-agreements]]></category>
		<category><![CDATA[SLAs]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[Startups]]></category>
		<category><![CDATA[Terremark]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=164362</guid>
		<description><![CDATA[Startups and enterprises alike face barriers when it comes to cloud adoption. This includes security, speed of access to cloud resources, and runaway network costs. However, multiple solutions for direct access are being provided to address this issue for companies big and small.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=597062&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=597062&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=65937"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=65937" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=597062+how-direct-access-solutions-can-speed-up-cloud-adoption&utm_content=robustcloudlarry">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=597062+how-direct-access-solutions-can-speed-up-cloud-adoption&utm_content=robustcloudlarry">Cloud computing infrastructure: 2012 and beyond</a></li><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=597062+how-direct-access-solutions-can-speed-up-cloud-adoption&utm_content=robustcloudlarry">Migrating media applications to the private cloud: best practices for businesses</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=597062+how-direct-access-solutions-can-speed-up-cloud-adoption&utm_content=robustcloudlarry">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/12/how-direct-access-solutions-can-speed-up-cloud-adoption/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/06/clouds.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/06/clouds.jpg?w=150" medium="image">
			<media:title type="html">clouds</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/6047bca2bdf1ce8a0938481074a8ed7c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">robustcloudlarry</media:title>
		</media:content>
	</item>
		<item>
		<title>The promise of SDNs in the enterprise</title>
		<link>http://pro.gigaom.com/2012/11/an-overview-of-the-software-defined-networking-market/</link>
		<comments>http://pro.gigaom.com/2012/11/an-overview-of-the-software-defined-networking-market/#comments</comments>
		<pubDate>Fri, 09 Nov 2012 17:13:54 +0000</pubDate>
		<dc:creator>doyleresearch</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Adara Networks]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[arista]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[big switch]]></category>
		<category><![CDATA[Brocade]]></category>
		<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Cisco Systems]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Data Centers]]></category>
		<category><![CDATA[data-security]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[embrane]]></category>
		<category><![CDATA[enterprise IT]]></category>
		<category><![CDATA[Enterprise Mobility]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[HP]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[juniper]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[multilatencey]]></category>
		<category><![CDATA[NEC]]></category>
		<category><![CDATA[nicira]]></category>
		<category><![CDATA[open]]></category>
		<category><![CDATA[OpenFlow]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[Public Clouds]]></category>
		<category><![CDATA[Riverbed]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[SDN]]></category>
		<category><![CDATA[Selerity]]></category>
		<category><![CDATA[software defined networks]]></category>
		<category><![CDATA[Unified Communications]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Vyatta]]></category>
		<category><![CDATA[Zynga]]></category>

		<guid isPermaLink="false">http://pro.gigaom.com/?p=157212</guid>
		<description><![CDATA[The growth of public and private cloud services places new demands on the IT organization, particularly when it comes to the scale, agility and management of the data center. SDNs are a response to those demands, providing opportunities for IT managers to improve their network operations.

<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582864&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582864&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=97690"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=97690" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=582864+an-overview-of-the-software-defined-networking-market&utm_content=doyleresearch">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/06/cloud-computing-infrastructure-2012-and-beyond/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=582864+an-overview-of-the-software-defined-networking-market&utm_content=doyleresearch">Cloud computing infrastructure: 2012 and beyond</a></li><li><a href="http://pro.gigaom.com/2011/07/infrastructure-q2-big-data-and-paas-gain-more-momentum/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=582864+an-overview-of-the-software-defined-networking-market&utm_content=doyleresearch">Infrastructure Q2: Big data and PaaS gain more momentum</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=pro&utm_medium=editorial&utm_campaign=auto3&utm_term=582864+an-overview-of-the-software-defined-networking-market&utm_content=doyleresearch">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://pro.gigaom.com/2012/11/an-overview-of-the-software-defined-networking-market/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/11/datacenter115.jpg?w=150" />
		<media:content url="https://gigaom-pro-files.s3.amazonaws.com/files/2012/11/datacenter115.jpg?w=150" medium="image">
			<media:title type="html">datacenter115</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/cbb7135ce2db58007dd75f38bb3d82a3?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">doyleresearch</media:title>
		</media:content>
	</item>
		<item>
		<title>Facebook open sources Corona &#8212; a better way to do webscale Hadoop</title>
		<link>http://gigaom.com/2012/11/08/facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop/</link>
		<comments>http://gigaom.com/2012/11/08/facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop/#comments</comments>
		<pubDate>Thu, 08 Nov 2012 20:01:15 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[webscale]]></category>
		<category><![CDATA[Corona]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=582252</guid>
		<description><![CDATA[Facebook has open sourced a new system called Corona for scheduling and managing Hadoop jobs. Corona attempts to do away with many of the problems that come along with massive-scale Hadoop operations, and soon looks to take Facebook's Hadoop deployment beyond just MapReduce.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582252&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Facebook is at it again, building more software to make Hadoop a better way to do big data at web scale. Its latest creation, which the company <a href="https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona">has also open sourced</a>, is called Corona and aims to make Hadoop more efficient, more scalable and more available by re-inventing how jobs are scheduled.</p>
<p>As with <a href="http://www.facebook.com/note.php?note_id=468211193919">most of its changes to Hadoop over the years</a> &#8212; including the <a href="http://gigaom.com/cloud/how-facebook-keeps-100-petabytes-of-hadoop-data-online/">recently unveiled AvatarNode</a> &#8212; Corona came to be because Hadoop simply wasn&#8217;t designed to handle Facebook&#8217;s scale or its broad usage of the platform. What kind of scale are we talking about? According to Facebook engineers Avery Ching, Ravi Murthy, Dmytro Molkov,‎ Ramkumar Vadali, and Paul Yang <a href="https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920">in a blog post detailing Corona on Thursday</a>, the company&#8217;s largest cluster is more than 100 petabytes; it runs more than 60,000 Hive queries a day; and its data warehouse has grown 2,500x in four years.</p>
<p>Further, Ching and company note &#8212; echoing something Facebook VP of Infrastructure Engineering Jay Parikh told me in September when <a href="http://gigaom.com/data/for-the-future-of-big-data-startups-look-to-facebook/">discussing the future of big data startups</a> &#8212; Hadoop is responsible for a lot of how Facebook runs both its platform and its business:</p>
<blockquote><p>Almost every team at Facebook depends on our custom-built data infrastructure for warehousing and analytics, with roughly 1,000 people across the company &#8212; including both technical and non-technical personnel &#8212; using these technologies every day. Over half a petabyte of new data arrives in the warehouse every 24 hours, and ad-hoc queries, data pipelines, and custom MapReduce jobs process this raw data around the clock to generate more meaningful features and aggregations.</p></blockquote>
<h2>So, what is Corona?</h2>
<p>In a nutshell, Corona represents a new system for scheduling Hadoop jobs that makes better use of a cluster&#8217;s resources and also makes it more amenable to multitenant environments like the one Facebook operates. Ching et al explain the problems and the solution in some detail, but the short explanation is that Hadoop&#8217;s JobTracker node is responsible for both cluster management and job-scheduling, but has a hard time keeping up with both tasks as clusters grow and the number of jobs sent to them increase.</p>
<p>Further, job-scheduling in Hadoop involves an inherent delay, which is problematic for small jobs that need fast results. And a fixed configuration of &#8220;map&#8221; and &#8220;reduce&#8221; slots means Hadoop clusters run inefficiently when jobs don&#8217;t fit into the remaining slots or when they&#8217;re not MapReduce jobs at all.</p>
<p>Corona resolves some of these problems by creating individual job trackers for each job and a cluster manager focused solely on tracking nodes and the amount of available resources. Thanks to this simplified architecture and a few other changes, the latency to get a job started is reduced and the cluster manager can make fast scheduling decisions because it&#8217;s not also responsible for tracking the progress of running jobs. Corona also incorporates a feature that divvies a cluster into resource pools to ensure every group within the company gets its fair share of resources.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/corona.jpg"><img  title="corona" alt="" src="http://gigaom2.files.wordpress.com/2012/11/corona.jpg?w=708"   class="aligncenter size-full wp-image-582351" /></a></p>
<p>The results have lived up to expectations since Corona went into full production in mid-2012: the average time to refill idle resources improved by 17 percent; resource utilization over regular MapReduce improved to 95 percent from 70 percent (in a simulation cluster); resource unfairness dropped to 3.6 percent with Corona versus 14.3 percent with traditional MapReduce; and latency on a test job Facebook runs every four minutes has been</p>
<p>Despite the hard work put into building and deploying Corona, though, the project still was a way to go. One of the biggest improvements currently being developed is to enable resource management based on CPU, memory and other job requirements rather than just the number of &#8220;map&#8221; and &#8220;reduce&#8221; slots needed. This will open Corona up to running non-MapReduce jobs, therefore making a Hadoop cluster more of a general-purpose parallel computing cluster.</p>
<p>Facebook is also trying to incorporate online upgrades, which would mean a cluster doesn&#8217;t have to come down every time part of the management layer undergoes an update.</p>
<h2>Why Facebook sometimes must re-invent the wheel</h2>
<p>Anyone deeply familiar with the Hadoop space might be thinking that a lot of what Facebook has done with Corona sounds familiar &#8212; and that&#8217;s because it kind of is. The <a href="http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/">Apache YARN project</a> that has been integrated into the latest version of Apache Hadoop similarly splits the JobTracker into separate cluster-management and job-tracking components, and already allows for non-MapReduce workloads. Further, there <a href="http://gigaom.com/cloud/the-unsexy-side-of-big-data-6-tools-to-manage-your-hadoop-cluster/">is a whole class of commercial and open source cluster-management tools</a> that have their own solutions to the problems Corona tries to solve, including <a href="http://incubator.apache.org/mesos/index.html">Apache Mesos</a>, which is <a href="http://gigaom.com/cloud/twitter-backs-fave-big-data-projects-with-apache-sponsorship/">Twitter&#8217;s tool of choice</a>.<br />
However, anyone who&#8217;s familiar with Facebook knows the company isn&#8217;t likely to buy software from anyone. It also has reached a point of customization with its Hadoop environment where even open-source projects from Apache won&#8217;t be easy to adapt to Facebook&#8217;s unique architecture. From the blog post:</p>
<blockquote><p>It’s worth noting that we considered Apache YARN as a possible alternative to Corona. However, after investigating the use of YARN on top of our version of HDFS (a strong requirement due to our many petabytes of archived data) we found numerous incompatibilities that would be time-prohibitive and risky to fix. Also, it is unknown when YARN would be ready to work at Facebook-scale workloads.</p></blockquote>
<p>So, Facebook plods forward, a Hadoop user without equal (save for maybe Yahoo) left building its own tools in isolation. What will be interesting to watch as Hadoop adoption picks up and more companies beging building applications atop it is how many actually utilize the types of tools that companies like Facebook, <a href="http://gigaom.com/cloud/how-twitter-is-doing-its-part-to-democratize-big-data/">Twitter</a> and <a href="http://gigaom.com/data/quantcast-releases-bigger-faster-stronger-hadoop-file-system/">Quantcast</a> have created and open sourced. They might not have commercial backers behind them, but they&#8217;re certainly built to work well at scale.</p>
<p><em>Feature image courtesy of Shutterstock user <a href="http://www.shutterstock.com/gallery-10991p1.html">Johan Swanepoel</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=582252&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=564280"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=564280" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582252+facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582252+facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582252+facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=582252+facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/08/facebook-open-sources-corona-a-better-way-to-do-webscale-hadoop/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_42996799.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_42996799.jpg?w=150" medium="image">
			<media:title type="html">herd of elephants</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/corona.jpg" medium="image">
			<media:title type="html">corona</media:title>
		</media:content>
	</item>
		<item>
		<title>How MemCachier went from a favor for a friend to cloud ubiquity</title>
		<link>http://gigaom.com/2012/09/05/how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity/</link>
		<comments>http://gigaom.com/2012/09/05/how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity/#comments</comments>
		<pubDate>Wed, 05 Sep 2012 17:00:06 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[AppFog]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[cloudbees]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[DotCloud]]></category>
		<category><![CDATA[facebok]]></category>
		<category><![CDATA[Heroku]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[MemCachier]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[web apps]]></category>
		<category><![CDATA[web architecture]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=559464</guid>
		<description><![CDATA[Hosted memcached provider MemCachier is expanding like crazy, moving from its homebase on Heroku into the AppFog, CloudBees, DotCloud and Amazon EC2 platforms. It's impressive growth for a bootstrapped company that launched in April and was little more than an idea a year ago.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=559464&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A funny thing happened with Amit Levy&#8217;s side project in 2011 to build a hosted memcached service &#8212; it became a company. Now that company, <a href="http://www.memcachier.com/">MemCachier</a>, is striving for omnipresence in the cloud, and extending its reach from the Heroku platform as a service onto a number of PaaS offerings and even Amazon EC2, where it will directly compete with Amazon Web Services&#8217; own ElastiCache service. It&#8217;s impressive growth for a young company that was never really meant to be.</p>
<p>According to co-founder Alex Loddengaard, Levy began building MemCachier as a side project in mid-2011, and he hosted a private beta version in the <a href="https://addons.heroku.com/">Heroku add-on market </a>so a friend could easily access the service. The team at Heroku saw the service, liked it and encouraged Levy to pursue it for real. Levy, who&#8217;s still in the middle of getting a Ph.D. from Stanford, called Loddengaard (who taught Levy while a teaching assistant at the University of Washington) and fellow Stanford Ph.D. candidate David Terei for help, and MemCachier launched in April 2012.</p>
<div id="attachment_559581" class="wp-caption alignleft" style="width: 160px"><a href="http://gigaom2.files.wordpress.com/2012/09/alex-150x150.jpg"><img  title="alex-150x150" src="http://gigaom2.files.wordpress.com/2012/09/alex-150x150.jpg?w=708" alt=""   class="size-full wp-image-559581" /></a><p class="wp-caption-text">Alex Loddengaard</p></div>
<p>Landing Loddengaard wasn&#8217;t too tough. He had quit his job at software-development firm Atlassian, after beginning his career at Google and then following his boss Christophe Bisciglia to Hadoop pioneer Cloudera, where Loddengaard was a pre-funding employee. (MemCachier, by the way, now shares office space with <a href="http://gigaom.com/cloud/hadoop-startup-wibidata-raises-5m-to-power-web-analytics/">Bisciglia&#8217;s new company, WibiData</a>, in the former Atlassian headquarters.) He was living off his savings, had &#8220;built a bunch of stupid web apps that you never heard of&#8221; and was trying to figure out what to do next, he told me. And then Levy called.</p>
<h2>Memcached, and MemCachier, are everywhere</h2>
<p><a href="http://memcached.org/">Memcached</a> is a popular open-source key-value system that speeds up web applications by caching certain data in the memory of distributed systems rather than on disk in the database itself. Facebook is widely cited as the largest user for the hundreds of terabytes it&#8217;s now storing in memcached, but, Loddengaard said, &#8220;Every company that needs to scale uses memcached.&#8221;</p>
<p>Aside from the core open source version, developers might choose the Couchbase&#8217;s eponymous NoSQL database (into which <a href="http://gigaom.com/cloud/couchbase-2-0-unql-sql-nosql/">the popular memcached implementation Membase Server has been integrated</a>) or its hosted Membase service called <a href="https://addons.heroku.com/memcache">Memcache</a>, which is available on Heroku. Another hosted option is AWS&#8217;s <a href="http://aws.amazon.com/elasticache/">ElastiCache</a>, a membased-compliant service <a href="http://gigaom.com/cloud/amazon-elasticache/">available to developers building web applications on the Amazon EC2 cloud</a>.</p>
<p>Since starting off on Heroku, MemCachier has already expanded to the AppHarbor and Cloud Control platforms, but Wednesday&#8217;s expansion represents  the company&#8217;s first real introduction to the public, Loddengaard said. Now, MemCachier is also available on <a href="http://gigaom.com/cloud/appfog-lets-you-pick-your-cloud-almost-any-cloud/">AppFog</a>, <a href="http://gigaom.com/cloud/cloudbees-puts-its-paas-anywhere/">CloudBees</a> and <a href="http://gigaom.com/2011/07/04/dotcloud/">DotCloud</a> &#8212; three popular PaaS offerings &#8212; as well as Amazon EC2.</p>
<h2>Growing isn&#8217;t always easy</h2>
<p>Moving to Amazon&#8217;s cloud, in particular, also meant a change in pricing to reflect a different class of user (e.g., AWS mega-user Netflix) than most PaaS offerings attract. Whereas MemCachier&#8217;s options on Heroku range from 100MB to 10GB in size, Amazon users can get up to a 100GB instance. Loddengaard said most Amazon EC2 users use more than a gigabyte of RAM for memcached, and ElastiCache actually starts out at 1.3GB.</p>
<div id="attachment_559584" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/09/architecture-diagram-cropped-300x198.jpg"><img  title="architecture-diagram-cropped-300x198" src="http://gigaom2.files.wordpress.com/2012/09/architecture-diagram-cropped-300x198.jpg?w=708" alt=""   class="size-full wp-image-559584" /></a><p class="wp-caption-text">MemCachier&#8217;s architecture, simplified.</p></div>
<p>Loddengaard acknowledges that trying to woo developers away from ElastiCache service on Amazon&#8217;s own platform won&#8217;t necessarily be easy, but he thinks the difference in approach between the two services favors MemCachier for a particular class of developers &#8212; those who don&#8217;t want to manage their infrastructure too closely. Whereas ElastiCache still requires users to manage their instances, as is the norm with Amazon&#8217;s lower-level infrastructure-as-a-service platform, MemCachier is about &#8220;no operations whatsoever,&#8221; he said. &#8220;Developers shouldn&#8217;t spend any time operating servers over developing software.&#8221;</p>
<p>That mindset has proven effective so far. Thanks to word of mouth alone, the bootstrapped MemCachier has been growing steadily in terms of revenue and users, now claiming more than 1,500 developers, but its broader footprint and some proactive marketing should mean sharp upticks in both areas. However, a jump in users &#8212; especially the larger ones that might come from Amazon EC2 &#8212; will probably require MemCachier to grow beyond its current three-person team. Of course, there are worse problems to have.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-120493p1.html">Shutterstock user optimarc</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=559464&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=549876"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=549876" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=559464+how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=559464+how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li><li><a href="http://pro.gigaom.com/2011/04/infrastructure-q1-iaas-comes-down-to-earth-big-data-takes-flight/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=559464+how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity&utm_content=dharrisstructure">Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight</a></li><li><a href="http://pro.gigaom.com/2012/10/sector-roadmap-platform-as-a-service-in-2012/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=559464+how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity&utm_content=dharrisstructure">Platform as a Service in 2012</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/05/how-memcachier-went-from-a-favor-for-a-friend-to-cloud-ubquity/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_94000018.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_94000018.jpg?w=150" medium="image">
			<media:title type="html">Expanding spiral</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/alex-150x150.jpg" medium="image">
			<media:title type="html">alex-150x150</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/architecture-diagram-cropped-300x198.jpg" medium="image">
			<media:title type="html">architecture-diagram-cropped-300x198</media:title>
		</media:content>
	</item>
		<item>
		<title>Why crowdsourced computing benchmarks are the future</title>
		<link>http://gigaom.com/2012/07/26/why-crowdsourced-computing-benchmarks-are-the-future/</link>
		<comments>http://gigaom.com/2012/07/26/why-crowdsourced-computing-benchmarks-are-the-future/#comments</comments>
		<pubDate>Thu, 26 Jul 2012 19:19:05 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=546988</guid>
		<description><![CDATA[Cloud computing and open source software have freed IT practitioners from so much legacy vendor baggage over the past few years. Isn't it time to free them from inane benchmark boasting, too? A crowdsourced platform where users share their real-world performance experiences could help.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=546988&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Performance benchmarks for computing systems might make for good television — there’s always plenty of heated debate, talk of world records and  sometimes <a href="http://www.wired.com/wiredenterprise/2012/07/oracle-advertising-slapped/">even a little ethical drama</a> — but in the end there’s not a lot of substance.</p>
<p>More often than not, they’re conducted on highly optimized systems <a href="http://en.wikipedia.org/wiki/LINPACK_benchmarks#Criticism">running workloads that don’t necessarily mirror</a> what anyone actually runs in production. If the results in question come from vendors rather than third parties, there’s a good chance they’ve only been published because the vendor was able to achieve the desired result. Thinking your experience will be equally as fast is like watching a fishing show on television and then hitting the water expecting bite after bite.</p>
<p>However, cloud computing and the advent of popular open source software such as Hadoop and NoSQL databases could change the way we do benchmarks. With relatively little cost and effort, anyone can conduct their own tests to see how their specific applications and configurations run on their specific infrastructure. Throw in a platform to share these results, and you have crowdsourced performance benchmarks free from vendor hype and the vacuum-like conditions of standardized tests.</p>
<p>Ideally, it ends working a lot like the crowdsourced medical platforms I’ve come across lately, <a href="http://gigaom.com/cloud/better-medicine-brought-to-you-by-big-data/">PatientsLikeMe</a> and the forthcoming <a href="http://gigaom.com/2012/07/19/5-las-vegas-startups-you-need-to-know/">Lucine Biotechnology</a>. Rather than rely on claims from drug companies or even doctors whose knowledge is limited to published research, users share their own real-world experiences with drugs, symptoms and side effects, and learn from others like them what they might expect.</p>
<h2>It’s might get worse before it gets better</h2>
<p>However, with so many Hadoop distributions in the market now, and so much money at play, I’d prepare to hear a lot more chest-beating in the months to come about whose implementation is actually fastest. It actually has been going on for a while – in December, I <a href="http://gigaom.com/cloud/my-hadoop-is-bigger-than-yours/">detailed a series of purported and contested records</a> from SGI, MapR and HPCC Systems on the Terasort benchmark — and not all the voices have yet to be heard. Like SGI before them, hardware partners such as Dell, HP and Cisco probably want to prove their reference architectures are the best.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/vmware-study.jpg"><img title="vmware study" src="http://gigaom2.files.wordpress.com/2012/07/vmware-study.jpg?w=300&#038;h=195" alt="" width="300" height="195" class="alignright size-medium wp-image-547101"></a>And VMware has already <a href="http://www.vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf">published a study claiming that Hadoop actually performs faster</a> on its vSphere hypervisor than on bare metal. If Hadoop workloads <a href="http://gigaom.com/cloud/vmware-aims-for-hadoop-on-vms-with-serengeti-project/">really do move to virtual machines</a>, Hadoop vendors are going to have to prove themselves there, too. VMware’s study ran CDH3 (Cloudera’s third-generation distribution), but Hortonworks has been <a href="http://gigaom.com/cloud/hortonworks-teams-with-vmware-to-keep-hadoop-running/">working closely with VMware lately</a> and might have something to say. Of course, EMC Greenplum is actually under the same corporate umbrella as VMware and can’t afford to be seen as slower than the competition on virtualized servers.</p>
<p>In cloud computing, too, providers have spent the past few years arguing against the idea of cloud servers as commodities by claiming their systems offer the best performance. There have been plenty of boasts (<a href="http://blog.cloudharmony.com/2011/11/many-are-skeptical-of-claims-that.html">sometimes, perhaps, misleading</a>) and <a href="http://pro.gigaom.com/2011/06/benchmarking-the-cloud-your-mileage-may-vary/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=546988+why-crowdsourced-computing-benchmarks-are-the-future&amp;utm_content=dharrisstructure">quite a few attempts to benchmark cloud system and network performance</a> (<em>GigaOM Pro subscription req’d</em>) With that much of the cloud market still up for grabs, we’re not yet done hearing about whose cloud is the biggest and the fastest (see, for example, <a href="http://gigaom.com/cloud/why-google-compute-engine-may-be-attractive-to-amazon-web-services-users/">Google’s emphasis on performance</a> when it launched Compute Engine last month).</p>
<h2>But it should get better</h2>
<p>Despite all the effort by vendors and cloud providers to claim superiority, though, the truth is that it’s easier than ever for users of next-generation software and services to run their own tests. Hadoop or NoSQL databases aren’t expensive Oracle software that needs to run on scale-up, big-iron systems; they’re all free to download and can run on small clusters of commodity boxes. For applications that are going to run in the cloud, renting a few instances from a cloud provider might only cost a few bucks.</p>
<p>Sure, it might take a little time to configure everything (although software vendors might be willing to help), but isn’t that effort worth it in the end?</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpeg"><img title="gce-vs-ec2-copy" src="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpeg?w=300&#038;h=245" alt="" width="300" height="245" class="alignleft size-medium wp-image-547102"></a>Use cases and test results that demonstrate the value of crowdsourcing in-the-wild performance metrics are everywhere on corporate technology blogs across the web. Earlier this month, for example, Medialets discussed how it tested its Hadoop workload (on a cluster of rented physical machines) and found that <a href="http://allthingshadoop.com/2012/07/10/hadoop-distribution-bake-off-my-experience-with-cloudera-and-mapr">Cloudera actually outperformed the supposedly faster MapR</a> for that job. This week, video-transcoding service Zencoder <a href="http://gigaom.com/cloud/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/">shared some interesting (if not always surprising) results</a> when it compared Amazon Web Services’ highest-powered cloud instances to Google Compute Engine’s best.</p>
<p>Aggregated and indexed on a single platform, these types of experiences could help quiet the boasts from vendors and industry organizations touting their latest benchmark results. I’d argue it matters a lot more to a systems architect to know the production throughput of someone running a similar application on similar resources than to know how fast a generic workload ran in a lab on a setup he doesn’t have. Add some analytics to this information, and ideal configurations for different application types and data volumes might begin to emerge.</p>
<p>Cloud computing and open source software have freed IT practitioners from so much legacy vendor baggage over the past few years. Isn’t it time to free them from inane benchmark boasting, too?</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-11733p1.html">Shutterstock user Suzanne Tucker</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=546988&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=384368"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=384368" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=546988+why-crowdsourced-computing-benchmarks-are-the-future&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/06/benchmarking-the-cloud-your-mileage-may-vary/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=546988+why-crowdsourced-computing-benchmarks-are-the-future&utm_content=dharrisstructure">Benchmarking the Cloud: Your Mileage May Vary</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=546988+why-crowdsourced-computing-benchmarks-are-the-future&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=546988+why-crowdsourced-computing-benchmarks-are-the-future&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/26/why-crowdsourced-computing-benchmarks-are-the-future/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/07/race-e1343328741292.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/07/race-e1343328741292.jpg?w=150" medium="image">
			<media:title type="html">race</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/vmware-study.jpg?w=300" medium="image">
			<media:title type="html">vmware study</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpeg?w=300" medium="image">
			<media:title type="html">gce-vs-ec2-copy</media:title>
		</media:content>
	</item>
		<item>
		<title>Head to head: Amazon cloud beats Google on video benchmark</title>
		<link>http://gigaom.com/2012/07/24/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/</link>
		<comments>http://gigaom.com/2012/07/24/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/#comments</comments>
		<pubDate>Tue, 24 Jul 2012 19:45:25 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[iaas]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[video transcoding]]></category>
		<category><![CDATA[Zencoder]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=545923</guid>
		<description><![CDATA[Benchmarking results from Zencoder show that Amazon Web Services beats out Google's Compute Engine in a test of a specific CPU-intensive workload. Compute Engine's performance was hindered by a lack of HPC instances, which Google could one day add. But it's nice to see real-world comparisons.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545923&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>According benchmark tests by video-transcoding startup <a href="http://zencoder.com">Zencoder</a>, Google&#8217;s new Compute Engine infrastructure-as-a-service offering has some work to do if it wants to catch up with Amazon Web Services on the performance front. But the offering, still in &#8220;limited preview&#8221; mode and far from fully baked, should be able to make the necessary adjustments rather easily.</p>
<p>The results, detailed in <a href="http://blog.zencoder.com/2012/07/23/first-look-at-google-compute-engine-for-video-transcoding/">a blog post on Tuesday</a>, suggest that Google Compute Engine&#8217;s real problem right now might just be a lack of high-performance instances. Its current workhorse &#8212; an 8-core Intel Sandy Bridge instance with 30GB of memory and 22 compute units &#8212; can&#8217;t hang with the Amazon Cluster Compute Instances that Zencoder <a href="http://gigaom.com/cloud/zencoder-raises-2m-for-cloud-based-video-encoding/">uses for its transcoding workloads</a>. The largest of those is a 16-core dual-CPU Intel Xeon instance providing 60.5GB of memory and 88 compute units running atop a 10 Gigabit Ethernet platform.</p>
<p>As Zencoder ramped up the workloads, the performance differences became clear:</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg"><img  title="GCE-vs-EC2 copy" src="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg?w=708" alt=""   class="aligncenter size-full wp-image-545947" /></a></p>
<p>Compute Engine didn&#8217;t fare any better when Zencoder tested transfer speeds between the cloud storage platform and the cloud computing platform. Whereas rates between Amazon S3 and Amazon EC2 topped out at 1,458.32 Mbps, the rate between Google Cloud Storage and Google Compute Engine peaked at 202.6 Mbps. In fact, the post&#8217;s author writes, &#8220;it appears that GCS is slower than S3, and GCE transfer is slower than EC2, such that even if you’re using Google for compute, you may be better off using S3 for storage.&#8221;</p>
<p>While the results are interesting because they&#8217;re the first real apple-to-apples comparison I&#8217;ve seen between Compute Engine and EC2 (BuildFax cloud architect Joe Emison&#8217;s pre-release benchmarks were pulled from <a href="http://www.informationweek.com/news/cloud-computing/infrastructure/240002899?pgno=1">his Compute Engine review on InformationWeek</a>), they need to taken as what they are. They are, as Zencoder points out, tests of a specific CPU-bound workload &#8212; the performance of which Google could improve by adding higher-powered instances &#8212; and don&#8217;t take into account the difficulties of running at massive scale &#8212; a capability Google touted <a href="http://http//gigaom.com/cloud/taking-on-amazon-google-launches-compute-on-demand-rival-to-ec2/">when it launched Compute Engine in June</a>.</p>
<p>And, the author notes, Compute Engine is generally a quality platform, &#8220;especially [with regard to] disk I/O, boot times, and consistency, which historically haven’t been EC2′s strong suit.&#8221;</p>
<p>This might actually be the more-important measure for most potential Compute Engine users. As <a href="http://gigaom.com/cloud/why-google-compute-engine-may-be-attractive-to-amazon-web-services-users/">GigaOM contributor James Urquhart wrote recently</a>, &#8220;If Google can deliver a service that eliminates most of the I/O and network performance inconsistencies that AWS customers currently experience, I can guarantee you there are many major compute customers of AWS that will want to give Compute Engine a test run.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545923&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=312259"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=312259" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">The fourth quarter of 2012 in cloud</a></li><li><a href="http://pro.gigaom.com/2012/12/how-direct-access-solutions-can-speed-up-cloud-adoption/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">How direct-access solutions can speed up cloud adoption</a></li><li><a href="http://pro.gigaom.com/2012/12/cloud-computing-2013-how-to-navigate-without-a-map/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545923+for-some-workloads-googles-cloud-cant-yet-hang-with-aws&utm_content=dharrisstructure">Cloud computing 2013: how to navigate without a map</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/24/for-some-workloads-googles-cloud-cant-yet-hang-with-aws/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg?w=150" medium="image">
			<media:title type="html">GCE-vs-EC2 copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/gce-vs-ec2-copy.jpg" medium="image">
			<media:title type="html">GCE-vs-EC2 copy</media:title>
		</media:content>
	</item>
		<item>
		<title>How one startup wants to inject Hadoop into your SQL</title>
		<link>http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/</link>
		<comments>http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/#comments</comments>
		<pubDate>Tue, 24 Jul 2012 18:40:52 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[Drawn to Scale]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hbase]]></category>
		<category><![CDATA[Mapr]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[webscale]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=545849</guid>
		<description><![CDATA[Drawn to Scale's Spire database is meant to be all things to all people -- it combines Hadoop, HBase and SQL to provide a fast, scalable, robust experience -- and now it has integrated with MapR's Hadoop distribution. It's no surprise the young company already claims big customers.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545849&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/07/hard_drives.jpg"><img  title="hard drives" src="http://gigaom2.files.wordpress.com/2012/07/shutterstock_108064520.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="alignleft size-medium wp-image-545921" /></a>Forget about standalone Hadoop clusters running batch workloads to churn through massive datasets a few times per day or per week. San Francisco-based startup <a href="http://drawntoscale.com">Drawn to Scale</a> thinks Hadoop&#8217;s real home within the enterprise will be in your database. No, it&#8217;s not not trying to sell you some limited-use NoSQL data store &#8212; it wants to turn your good, old-fashioned SQL database into a real-time analytic engine that can grow like a weed.</p>
<p>And it wants to make this transition as easy as possible for customers that don&#8217;t want to get their hands dirty working with Hadoop or trying to turn a relational database into something it isn&#8217;t.</p>
<p>Drawn to Scale&#8217;s product, called Spire, is built atop HBase (a distributed NoSQL database that uses the Hadoop Distributed File System), so it scales with ease. Because it uses SQL, users can write robust queries in a familiar manner. Because it uses a distributed index and knows exactly where to go to find the right data, it&#8217;s fast as heck. Because it leverages Chef to configure the database, there&#8217;s no need to learn the finer points deploying and managing a Hadoop cluster.</p>
<p>&#8220;We&#8217;re sort of the only game in town when it comes to cloud-scale databases that run on top of Hadoop,&#8221; says Drawn to Scale Founder and CEO Bradford Stephens. There are plenty of companies that have lots of data &#8212; petabyes of it, in some cases &#8212; and want to find new ways to analyze it, but they don&#8217;t necessarily want to learn how to deploy Hadoop or write MapReduce workloads.</p>
<p>As of Tuesday, Drawn to Scale has another carrot to offer prospective customers because Spire now <a href="http://www.marketwire.com/press-release/drawn-scale-delivers-real-time-sql-applications-on-hadoop-with-mapr-partnership-1683188.htm">ships with MapR&#8217;s M3 distribution</a> pre-integrated as the product&#8217;s Hadoop platform. Although it&#8217;s the bane of open source devotees such as fellow Hadoop vendors Cloudera and Hortonworks because it uses a proprietary file system in place of HDFS, MapR is gaining quite a following because of the performance benefits its technology brings. In fact, Drawn to Scale is just the latest MapR partner after Amazon Web Services recently <a href="http://gigaom.com/cloud/amazon-taps-mapr-for-high-powered-elastic-mapreduce/">tagged it for inclusion in its Elastic MapReduce offering</a> and Google <a href="http://www.mapr.com/blog/google-mapr">did the same for its Compute Engine cloud</a>.</p>
<div id="attachment_545918" class="wp-caption alignright" style="width: 130px"><a href="http://gigaom2.files.wordpress.com/2012/07/bradford-hathead-copy.jpg"><img  title="bradford-hathead copy" src="http://gigaom2.files.wordpress.com/2012/07/bradford-hathead-copy.jpg?w=708" alt=""   class="size-full wp-image-545918" /></a><p class="wp-caption-text">Bradford Stephens</p></div>
<p>When you&#8217;re building a distributed database, you need a fast distributed file system, he added, and MapR is the fastest Hadoop file system around. Because it&#8217;s fully API-compliant with HDFS, though, M3 still works just fine with Spire&#8217;s HBase underpinnings. &#8220;Despite our small size,&#8221; Stephens said, &#8220;we are more than willing to pick a horse and really advocate loudly for it.&#8221;</p>
<p>Although Drawn to Scale is relatively young &#8212; it&#8217;s still in private beta and <a href="http://gigaom.com/cloud/drawn-to-scale-raises-money-to-make-sql-big-data-ready/">just raised a $925,000 first round in March</a> &#8212; large companies are already buying into its message. Already, Stephens told me, it&#8217;s engaged in seven <em>paid</em> pilot programs with large credit card companies, telcos and other &#8220;traditional&#8221; types of enterprises. One of them is already trying to negotiate a deal for a 1,000-node production deployment, he added. The company&#8217;s skeleton crew of mostly engineers couldn&#8217;t keep up the support workload if it opened the floodgates on all the inbound interest, Stephens said.</p>
<p>That kind of demand isn&#8217;t surprising when you consider Hadoop&#8217;s <a href="http://gigaom.com/cloud/the-state-of-hadoop-strong-and-poised-to-explode/">promise as the platform for a new generation</a> of data-driven platforms. Whether or not <a href="http://gigaom.com/cloud/why-the-days-are-numbered-for-hadoop-as-we-know-it/">anyone actually wants to use MapReduce</a> &#8212; the programming framework that made Hadoop famous &#8212; they do want to take advantage of its nearly limitless scale on cheap, commodity hardware. When companies such as Drawn to Scale bake Hadoop into an otherwise useful product as a technological component rather than as the product itself, customers should experience the best of both worlds.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-188515p1.html">Shutterstock user Jakub Pavlinec</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=545849&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=717203"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=717203" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545849+how-one-startup-wants-to-inject-hadoop-into-your-sql&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545849+how-one-startup-wants-to-inject-hadoop-into-your-sql&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2010/10/with-scalable-data-stores-around-is-nosql-a-non-starter/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545849+how-one-startup-wants-to-inject-hadoop-into-your-sql&utm_content=dharrisstructure">With Scalable Data Stores Around, Is NoSQL a Non-Starter?</a></li><li><a href="http://pro.gigaom.com/2012/12/big-data-2013-key-trends-and-companies-to-watch/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=545849+how-one-startup-wants-to-inject-hadoop-into-your-sql&utm_content=dharrisstructure">Big data 2013: key trends and companies to watch</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/07/24/how-one-startup-wants-to-inject-hadoop-into-your-sql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/07/shutterstock_108064520.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/07/shutterstock_108064520.jpg?w=150" medium="image">
			<media:title type="html">hard drives</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/shutterstock_108064520.jpg?w=300" medium="image">
			<media:title type="html">hard drives</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/07/bradford-hathead-copy.jpg" medium="image">
			<media:title type="html">bradford-hathead copy</media:title>
		</media:content>
	</item>
	</channel>
</rss>
