<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>GigaOM &#187; Web Infrastructure</title>
	<atom:link href="http://gigaom.com/tag/web-infrastructure/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com</link>
	<description></description>
	<lastBuildDate>Thu, 20 Jun 2013 04:17:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gigaom.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0db8f6557d022075dbbf010c54d46d93?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>GigaOM &#187; Web Infrastructure</title>
		<link>http://gigaom.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gigaom.com/osd.xml" title="GigaOM" />
	<atom:link rel='hub' href='http://gigaom.com/?pushpress=hub'/>
		<item>
		<title>A real-time bonanza: Facebook&#8217;s Wormhole and Yahoo&#8217;s streaming Hadoop</title>
		<link>http://gigaom.com/2013/06/14/a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop/</link>
		<comments>http://gigaom.com/2013/06/14/a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop/#comments</comments>
		<pubDate>Fri, 14 Jun 2013 16:57:54 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[real-time]]></category>
		<category><![CDATA[Storm]]></category>
		<category><![CDATA[stream processing]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=657636</guid>
		<description><![CDATA[This week, both Facebook and Yahoo detailed new efforts to manage real-time data flows within their myriad systems. Yahoo's work is an open source implementation of Storm designed to run on the same cluster as Hadoop and even share resources.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=657636&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you’re into systems that can share data among each other in real time, this has been a good week. On Tuesday, Yahoo <a href="http://developer.yahoo.com/blogs/ydn/storm-yarn-released-open-source-143745133.html">open sourced its version</a> of the popular Storm stream-processing software that’s able to run inside Hadoop clusters. Then, on Thursday, Facebook <a href="https://www.facebook.com/notes/facebook-engineering/wormhole-pubsub-system-moving-data-through-space-and-time/10151504075843920">detailed a system called Wormhole</a> that informs the platform’s myriad applications when changes have occurred in another, so that each one is working from the newest data possible.</p>
<p>The Yahoo work is actually pretty important. Among the features Hadoop users have been demanding from the platform is a transition from batch-processing-only mode <a href="http://gigaom.com/2013/03/07/5-reasons-why-the-future-of-hadoop-is-real-time-relatively-speaking/">into something that can actually deal with data in real time</a>. The reason for the demand is quite simple: Although being able to analyze or transform data minutes to hours after it’s generated is helpful for certain analytic tasks, it’s not too helpful if you want an application to be able to act on data as it hits the system.</p>
<p>A service like Twitter is a prime example of where Storm can be valuable. Twitter uses Storm to handle tweets so users’ Timelines are up to date and do things like real-time analytics and spotting emerging trends. In fact, <a href="http://gigaom.com/2011/08/04/twitter-to-open-source-hadoop-like-tool/">it was Twitter that open sourced Storm in 2011</a> after buying Storm creator Backtype in order to get access to the technology and its developers.</p>
<p>Among web companies, Storm has become quite popular as a stream-processing complement to Hadoop since then. And now Yahoo has made possible a much tighter integration between the two — even to the point that Storm can borrow cycles from batch-processing nodes if it needs some extra juice. That’s a valuable feature — just last week I heard Twitter engineer Krishna Gade bemoan Storm’s auto-scaling limitations during a talk at Facebook’s <a href="http://analyticswebscale.splashthat.com/">Analytics @ Web Scale</a> event.</p>
<div id="attachment_657687" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/img_20130606_120037.jpg"><img alt="Krishna Gade talking Storm at the Facebook event." src="http://gigaom2.files.wordpress.com/2013/06/img_20130606_120037.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-657687"></a><p class="wp-caption-text">Krishna Gade talking Storm at the Facebook event.</p></div>
<p>The Storm-on-Hadoop work is among the first of many promised improvements to come thanks to <a href="http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/">YARN</a>, a major update to the Apache Hadoop 2.0 code that lets Hadoop clusters run multiple processing frameworks simultaneously. Twitter <a href="http://gigaom.com/2012/04/19/twitter-backs-fave-big-data-projects-with-apache-sponsorship/">has been using the open source Mesos resource manager</a> to achieve the same general capabilities, but Gade’s colleague Dmitriy Ryaboy said during the same talk that the company plans to begin using YARN for some big data workloads when it upgrades to Hadoop 2.0. He expects — probably correctly — much more community effort will go toward continuously improving its capabilities and building applications for YARN.</p>
<p>Facebook’s Wormhole project isn’t open source (as far as I can tell), but its lessons are still valuable (and LinkedIn has <a href="http://blog.linkedin.com/2011/01/11/open-source-linkedin-kafka/">open sourced a similar technologies named Kafka</a> and <a href="http://data.linkedin.com/projects/databus">Databus</a>). It’s what’s called a publish-subscribe system, which is essentially a concise way of saying that it manages communications between applications that publish information (e.g., updates to a database) and subscribe to the information their fellow applications are publishing. At Facebook, for example, Wormhole sends changes to Facebook’s master user database to Graph Search so that search results are as up to date as possible, or to its Hadoop environment so analytics jobs have the newest data.</p>
<p><a href="http://gigaom2.files.wordpress.com/2013/06/wormhole.png"><img alt="wormhole" src="http://gigaom2.files.wordpress.com/2013/06/wormhole.png?w=708&#038;h=584" width="708" height="584" class="aligncenter size-large wp-image-657677"></a></p>
<p>Of course, like all things Facebook (its <a href="http://gigaom.com/2013/06/06/facebook-unveils-presto-engine-for-querying-250-pb-data-warehouse/">new Presto interactive query engine</a> comes to mind), Wormhole is built to scale. Latency is in the low milliseconds and, blog post author Laurent Demailly notes</p>
<blockquote id="quote-wormhole-processes-o"><p>“Wormhole processes over <b>1 trillion</b> messages every day (significantly more than 10 million messages every second). Like any system at Facebook’s scale, Wormhole is engineered to deal with failure of individual components, integrate with monitoring systems, perform automatic remediation, enable capacity planning, automate provisioning and adapt to sudden changes in usage pattern.”</p></blockquote>
<p>Although they were developed within separate companies, there’s actually a tie that binds Yahoo’s Storm-in-Hadoop work and Facebook’s Wormhole. As web companies grow from their initial applications into sprawling business composed of numerous applications and services, so too do their infrastructures. To address the differing needs of their various systems at the data level, the companies have begun breaking them down by their latency requirements (i.e., real-time, near real-time and batch, however they choose to word them) and then building tools such as Storm and Wormhole to manage to flow of data between the systems.</p>
<p>We’ve previously explained in some detail <a href="http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/">how LinkedIn</a> and <a href="http://gigaom.com/2013/03/28/3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness/">Netflix</a> have built their data architectures around these principles, and we’ll hear a lot more about how they and other web companies are tackling this situation at <a href="http://event.gigaom.com/structure/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=657636+a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop&amp;utm_content=dharrisstructure">Structure next week</a>. Among the speakers are senior engineers and technology executives from Facebook, Google, LinkedIn, Box, Netflix and Amazon.</p>
<p><em><strong>Update: </strong>This post was updated at 1:46 p.m. to clarify that Twitter is not eliminating Mesos for all its workloads. </em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-553555p1.html">Shutterstock user agsandrew</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=657636&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=418744"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=418744" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657636+a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657636+a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657636+a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657636+a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop&utm_content=dharrisstructure">Dissecting the data: 5 issues for our digital future</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/14/a-real-time-bonanza-facebooks-wormhole-and-yahoos-streaming-hadoop/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/shutterstock_122114275.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/shutterstock_122114275.jpg?w=150" medium="image">
			<media:title type="html">streaming real time</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/img_20130606_120037.jpg?w=708" medium="image">
			<media:title type="html">Krishna Gade talking Storm at the Facebook event.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/wormhole.png?w=708" medium="image">
			<media:title type="html">wormhole</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Google is the big data company that matters most</title>
		<link>http://gigaom.com/2013/06/12/why-google-is-the-big-data-company-that-matters-most/</link>
		<comments>http://gigaom.com/2013/06/12/why-google-is-the-big-data-company-that-matters-most/#comments</comments>
		<pubDate>Wed, 12 Jun 2013 22:37:24 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[image recognition]]></category>
		<category><![CDATA[Jeff Dean]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[Structure 2013]]></category>
		<category><![CDATA[Web Infrastructure]]></category>
		<category><![CDATA[webscale-computing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=657257</guid>
		<description><![CDATA[Google Image Search just got a whole lot better, and the company's purpose-built machine learning system infrastructure is a big reason why. No surprise, Jeff Dean helped build it.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=657257&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Every now and then, someone asks “Who’ll be the Google of big data?”. The only acceptable answer, it seems, is that Google is the Google of big data. Yeah, it’s a web company on the surface, but Google has been at the forefront of using data to build compelling products for more than a decade, and it’s not showing any signs of slowing down.</p>
<p>Search, advertising, Translate, Play Music, Goggles, Trends and the list goes on — they’re all products that couldn’t exist without lots of data. But data alone doesn’t make products great — they also need to perform fast and reliably, and they eventually need to get more intelligent. Infrastructure and systems engineering make that possible, and that’s where Google really shines.</p>
<p>On Wednesday, the company showed off its chops once again, <a href="http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html">explaining in a blog post how it’s able to let users better search their photos</a> because it was able to train some novel models on systems built for just that purpose. Here’s how Google describes the chain of events, after it had found the methods it wanted to test (from the winning team at the ImageNet competition):</p>
<blockquote id="quote-we-built-and-trained"><p>“We built and trained models similar to those from the winning team using <a href="http://research.google.com/archive/large_deep_networks_nips2012.html">software infrastructure</a> for training large-scale neural networks developed at Google in a group started by <a href="http://research.google.com/people/jeff/">Jeff Dean</a> and <a href="http://ai.stanford.edu/~ang/">Andrew Ng</a>. When we evaluated these models, we were impressed; on our test set we saw double the average precision when compared to other approaches we had tried. …</p>
<p>“Why the success now? … What is different is that both computers and algorithms have improved significantly. First, bigger and faster computers have made it feasible to train larger neural networks with much larger data. Ten years ago, running neural networks of this complexity would have been a momentous task even on a single image — now we are able to run them on billions of images. Second, new training techniques have made it possible to train the large deep neural networks necessary for successful image recognition.”</p></blockquote>
<p>Of course Google had a system in place for training large-scale neural networks. And of course Jeff Dean helped design it.</p>
<div id="attachment_657319" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/06/flowers.jpg"><img alt="Google's system can recognize flowers even when they're not in the focal point." src="http://gigaom2.files.wordpress.com/2013/06/flowers.jpg?w=708&#038;h=486" width="708" height="486" class="size-large wp-image-657319"></a><p class="wp-caption-text">Google’s system can recognize flowers even when they’re not in the focal point.</p></div>
<p>For me, Dean is among the highlights of our upcoming <a href="http://event.gigaom.com/structure/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=657257+why-google-is-the-big-data-company-that-matters-most&amp;utm_content=dharrisstructure">Structure conference</a> (June 19 and 20 in San Francisco). I’m going to sit down with him in a fireside chat and talk about all the cool systems Google has built thus far and what’s coming down the pike next. Maybe about what life is like being <a href="http://www.slate.com/articles/technology/doers/2013/01/jeff_dean_facts_how_a_google_programmer_became_the_chuck_norris_of_the_internet.html">the Chuck Norris of the internet</a>.</p>
<p>From an engineering standpoint, Dean has been one of the most important people in the short history of the web. He helped create MapReduce — the parallel processing engine underneath Google’s original search engine — and was the lead author on the MapReduce paper that <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">directly inspired the creation of Hadoop</a>. Dean has also played significant roles in creating other important Google systems, such as its BigTable distributed data store (which is the basis of NoSQL databases such as Cassandra, HBase and the <a href="http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/">National Security Agency’s Accumulo</a>) and <a href="http://gigaom.com/2012/09/17/googles-spanner-a-database-that-knows-what-time-it-is/">a globally distributed transactional database called Spanner</a>.</p>
<p>If you’re into big data or webscale systems, knowing what Dean is working on can be like looking into a crystal ball. When I asked Hadoop creator Doug Cutting what the future holds for Hadoop, he told me to look at Google.</p>
<p>“They send us messages through these technical papers,” Cutting said, “so we can see what’s coming.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=657257&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=951361"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=951361" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657257+why-google-is-the-big-data-company-that-matters-most&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657257+why-google-is-the-big-data-company-that-matters-most&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657257+why-google-is-the-big-data-company-that-matters-most&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=657257+why-google-is-the-big-data-company-that-matters-most&utm_content=dharrisstructure">A near-term outlook for big data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/06/12/why-google-is-the-big-data-company-that-matters-most/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/06/jeff-dean.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/06/jeff-dean.jpeg?w=150" medium="image">
			<media:title type="html">jeff-dean</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/06/flowers.jpg?w=708" medium="image">
			<media:title type="html">Google&#039;s system can recognize flowers even when they&#039;re not in the focal point.</media:title>
		</media:content>
	</item>
		<item>
		<title>How Snapchat made a leap of faith by building atop Google cloud services</title>
		<link>http://gigaom.com/2013/05/07/snapchats-act-of-faith-in-building-on-google-compute-engine/</link>
		<comments>http://gigaom.com/2013/05/07/snapchats-act-of-faith-in-building-on-google-compute-engine/#comments</comments>
		<pubDate>Tue, 07 May 2013 17:28:57 +0000</pubDate>
		<dc:creator>Stacey Higginbotham</dc:creator>
				<category><![CDATA[Adrian Cockcroft]]></category>
		<category><![CDATA[Backblaze]]></category>
		<category><![CDATA[Bobby Murphy]]></category>
		<category><![CDATA[Cory von Wallenstain]]></category>
		<category><![CDATA[Dyn]]></category>
		<category><![CDATA[Gleb Budman]]></category>
		<category><![CDATA[Netflix]]></category>
		<category><![CDATA[Snapchat]]></category>
		<category><![CDATA[Structure 2013]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=642853</guid>
		<description><![CDATA[As more companies build their businesses on cloud infrastructure, it's important to not only understand the technical decisions behind their architecture, but also the economic ones. That's one of the topics we'll explore at Structure.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642853&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>This post was corrected at 12:16pm to correctly identify the Google services used by Snapchat. It is using Google App Engine.</em></p>
<p>Building out the infrastructure for Snapchat was an act of faith, according to co-founder and CTO Bobby Murphy. The company, which apparently was so easy to build that a <a href="http://www.slate.com/articles/technology/technology/2012/12/facebook_s_poke_its_snapchat_clone_is_a_bad_sign.html">Facebook engineer took two weeks to mock up a similar service</a>, operates on Google’s App Engine. That’s a notable choice in a field of startups that have chosen the more popular cloud services provided by Amazon Web Services.</p>
<p>But Murphy likes App Engine, he said in a recent phone conversation, and he believes Google is scaling out and willing to invest in this platform. He prefers some of the features for Snapchat’s purposes and believes when it comes to scale, Google could offer more than AWS for his application. The details behind his consideration will be the focus of Murphy’s chat onstage at the <a href="http://event.gigaom.com/structure/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&amp;utm_content=shigginbotham">Structure conference occurring June 19 and 20 in San Francisco</a>.</p>
<p>So if you caught <a href="http://www.colbertnation.com/the-colbert-report-videos/425950/april-30-2013/evan-spiegel---bobby-murphy">Murphy’s appearance on The Colbert Report</a> and want to learn more about the infrastructure and the economics of scaling out an app with 150 million photos uploaded daily, then <a href="http://event.gigaom.com/structure/registration/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&amp;utm_content=shigginbotham">register for Structure</a>.</p>
<p>Murphy’s is one of several developer-focused talks we’ll have this year as we try to draw more attention to the fact that building out applications on massive cloud infrastructures requires a change in thinking. It’s not just about learning how to build an application in the cloud, but also mandates a strategic approach regarding how to architect your applications in a way that <a href="http://gigaom.com/2012/10/01/to-scale-web-services-devops-devotees-should-consider-economics/">takes into consideration to the economics</a> of hosting them on someone else’s infrastructure.</p>
<p>We’ll have conversations with Cory von Wallenstein, the CTO of Dyn, focusing on how to build a process for evaluating and changing your architecture without disrupting your existing users. There will be another with Gleb Budman, the co-founder and CEO of Backblaze, and Adrian Cockcroft, cloud architect at Netflix, about building hugely scalable infrastructures in the face of serious logistical obstacles.</p>
<div id="attachment_603525" class="wp-caption alignright" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/01/1z5o8703.jpg"><img src="http://gigaom2.files.wordpress.com/2013/01/1z5o8703.jpg?w=708&#038;h=472" alt="Structure 2012: Aditya Agarwal - VP Engineering, Dropbox, Adrian Cockcroft - Director, Architecture, Netflix, Alexei Rodriguez - VP of Operations, Evernote Corporation, Jonathan Heiliger - General Partner, North Bridge Venture Partners" width="708" height="472" class="size-large wp-image-603525"></a><p class="wp-caption-text">Structure 2012: Aditya Agarwal – VP Engineering, Dropbox, Adrian Cockcroft – Director, Architecture, Netflix, Alexei Rodriguez – VP of Operations, Evernote Corporation, Jonathan Heiliger – General Partner, North Bridge Venture Partners</p></div>
<p>Six-and-a-half years ago when we started thinking about our first Structure event, it was a hard sell. People didn’t understand what cloud computing was, nor why a small technology blog would want to build a conference around web infrastructure. Our advertising team got questions like, “You want to hold a show on servers? Why?”</p>
<p>But we knew that just as the printing press changed the distribution of knowledge, the emergence of cloud computing, web-based services and even mobility would change how we disseminate information all over again. And in the process it would create new economic opportunities and change the way the world works.</p>
<p>However, that first Structure conference was about building that vision, not about the servers. If we were around back in the 1400s, we’d hold a gathering at a local tavern not about paper, but about the coming revolutions promised by that technology and and maybe even looking forward to the creation of the novel and widespread literacy.</p>
<p>So make sure you are in the audience at this event so you can predict how the future of the web is changing; not just how infrastructure has evolved, but how we’ll build businesses on top of it. <a href="http://event.gigaom.com/structure/registration/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&amp;utm_content=shigginbotham">Register here</a> and we’ll see you in June.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=642853&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=910842"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=910842" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&utm_content=shigginbotham">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&utm_content=shigginbotham">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2011/11/dissecting-the-data-5-issues-for-our-digital-future/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&utm_content=shigginbotham">Dissecting the data: 5 issues for our digital future</a></li><li><a href="http://pro.gigaom.com/2011/09/what-amazons-new-kindle-line-means-for-apple-netflix-and-online-media/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=642853+snapchats-act-of-faith-in-building-on-google-compute-engine&utm_content=shigginbotham">What Amazon&#8217;s new Kindle line means for Apple, Netflix and online media</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/05/07/snapchats-act-of-faith-in-building-on-google-compute-engine/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/12/snapchat-500.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/12/snapchat-500.jpg?w=150" medium="image">
			<media:title type="html">snapchat-500</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/aee37121e18bf76bb9fee4494bab237a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">shigginbotham</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o8703.jpg?w=708" medium="image">
			<media:title type="html">Structure 2012: Aditya Agarwal - VP Engineering, Dropbox, Adrian Cockcroft - Director, Architecture, Netflix, Alexei Rodriguez - VP of Operations, Evernote Corporation, Jonathan Heiliger - General Partner, North Bridge Venture Partners</media:title>
		</media:content>
	</item>
		<item>
		<title>Deep thinking on complex systems: A devops reading list</title>
		<link>http://gigaom.com/2013/04/21/great-devops-anti-fragility-and-complexity-resources/</link>
		<comments>http://gigaom.com/2013/04/21/great-devops-anti-fragility-and-complexity-resources/#comments</comments>
		<pubDate>Sun, 21 Apr 2013 20:30:22 +0000</pubDate>
		<dc:creator>James Urquhart</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[distributed computing]]></category>
		<category><![CDATA[systems design]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=632398</guid>
		<description><![CDATA[GigaOM contributor James Urquhart shares some of the best books, blogs and other information on the concepts of devops and complex IT systems. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632398&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As I wrap up my series digging into the relationship between complex systems, devops, anti-fragility and IT systems, I wanted to give you a set of resources that you can use to explore this subject in much more depth. As I hope you&#8217;ve picked up from the series (which I&#8217;ve linked in its entirety below), these concepts are critical to the new agility that many enterprises are realizing from service-based IT models.</p>
<h2 id="getting-started">Getting started</h2>
<p>Before you do anything else, if you haven&#8217;t already read <em><a href="http://itrevolution.com/books/phoenix-project-devops-book/">The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win</a></em>, by Gene Kim, Kevin Behr and George Spafford. If you don&#8217;t absolutely identify with the pain felt by the characters at the beginning of the book, or with the wisdom of the approach introduced by the end, then this concept probably won&#8217;t click with you. However, if you&#8217;ve spent any time involved in enterprise IT at all, I&#8217;m betting this book will hit home, both intellectually and emotionally.</p>
<p>After that, the previous posts in this series provide some good background, as well:</p>
<ul style="font-size:13px;line-height:19px;">
<li><a href="http://gigaom.com/2013/01/13/devops-complexity-and-anti-fragility-in-it-an-introduction/">Devops, complexity and anti-fragility in IT: An introduction</a></li>
<li><a href="http://gigaom.com/2013/01/19/devops-complexity-and-anti-fragility-in-it-risk-and-anti-fragility/">Devops, complexity and anti-fragility in IT: Risk and anti-fragility</a></li>
<li><a href="http://gigaom.com/2013/01/27/devops-complexity-and-anti-fragility-in-it-stability-and-resilience/">Devops, complexity and anti-fragility in IT: Stability and resilience</a></li>
<li><a href="http://gigaom.com/2013/02/16/devops-complexity-and-anti-fragility-in-it-context-and-composition/">Is your PaaS composable or contextual? (Hint: the answer matters)</a></li>
</ul>
<h2 id="complexity-and-anti-fragility">Complexity and anti-fragility</h2>
<p>Although I don&#8217;t love everything about Nasem Taleb&#8217;s <em><a href="http://www.amazon.com/Antifragile-Things-That-Gain-Disorder/dp/1400067820">Anti-Fragile: Things that Gain from Disorder</a></em>, it is undeniably one of the most important books I&#8217;ve read in a while. The reason for this is that it articulates a key concept that is often missed by those of us that seek resiliency in systems: that there is a class of systems that show a behavior that actually gains from randomness. In other words, they tend to move toward a &#8220;better&#8221; state over the course of both positive and negative variation in their environments. The post on risk and anti-fragility that I link to above covers this concept in more depth, but the book explores the concept in many different contexts.</p>
<p>The best book on complex systems that I&#8217;ve read to date remains <em><a href="http://www.amazon.com/COMPLEXITY-EMERGING-SCIENCE-ORDER-CHAOS/dp/0671872346">Complexity: The Emerging Science at the Edge of Order and Chaos</a></em>, by M. Mitchell Waldrop. The telling for the story behind the founding of the Santa Fe Institute, still considered the hub of complex systems science, Waldrop&#8217;s book covers much of both the concepts and the methods of exploring complex systems (and its critical subset, complex adaptive systems). It is a little out of date now, however.</p>
<p>If you prefer to learn by doing, Margaret Mitchell&#8217;s &#8221;<a href="http://www.complexityexplorer.org">Introduction to Complexity</a>&#8221; course through the Santa Fe Institute is an excellent 101- level course on the subject, though tilted heavily toward the academic study of the subject,. The only focus on practical applications comes via interviews with famous complex systems scientists.</p>
<h2 id="devops-and-continuous-integrat">Devops and continuous integration</h2>
<p>For devops, in particular, there are a lot of great sources available online, as well:</p>
<ul>
<li>A decent overview (with a decent list of both cultural and technical elements of devops) is &#8220;<a href="http://www.slideshare.net/dieterdm/devops-introduction-cegeka">Devops: An Introduction</a>,&#8221; a slide presentation from Patrick Dubois.</li>
<li>Anything written by <a href="http://itrevolution.com/devops-blog/">John Willis and Gene Kim at ITRevolution</a>, <a href="http://codeascraft.etsy.com">John Allspaw and his crew at Etsy</a>, <a href="http://techblog.netflix.com">the team at Netflix</a>, and a <a href="http://www.tracelytics.com/blog/the-devops-reading-list-10-books-blogs-you-should-be-reading/">host of others</a> are worth reading as well.</li>
<li>My favorite source for devops learning, however, is the <a href="http://devopsweekly.com">DevOps Weekly newsletter</a>, a very well-curated list of reading material each weekend. Definitely a must if you want to understand devops in depth and in real time.</li>
</ul>
<p>I hope everyone has gained something from these posts. I certainly believe this shift in focus &#8212; from risk avoidance to anti-fragility, from a focus on stability to a focus on resilience, and from a focus on large-grained contextual systems to small-grained composable alternatives &#8212; will and is opening a whole new world of agility, experimentation and execution for enterprise IT. It&#8217;s a critical subject for every IT practitioner to understand.</p>
<p>This is, of course, only a partial list of the many amazing books, web sites, blogs and events that I&#8217;ve used to explore this topic. I encourage you to add your favorites to the comments below, or share them with me on Twitter, where I am @jamesurquhart.</p>
<p><em>James Urquhart is vice president of products at enStratius and a regular GigaOM contributor.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-1012355p1.html">Shutterstock user Linda Parton</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632398&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=473628"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=473628" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632398+great-devops-anti-fragility-and-complexity-resources&utm_content=jurquhart">Sign up for a free trial</a>.</p><ul></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/21/great-devops-anti-fragility-and-complexity-resources/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_125328368.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/shutterstock_125328368.jpg?w=150" medium="image">
			<media:title type="html">gears</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/466b03d84ca851e58ee992d979936f30?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">jurquhart</media:title>
		</media:content>
	</item>
		<item>
		<title>Google&#8217;s infrastructure spending spree continues; $1.2B in Q1</title>
		<link>http://gigaom.com/2013/04/18/googles-infrastructure-spending-spree-continues-1-2b-in-q1/</link>
		<comments>http://gigaom.com/2013/04/18/googles-infrastructure-spending-spree-continues-1-2b-in-q1/#comments</comments>
		<pubDate>Thu, 18 Apr 2013 21:37:34 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Data Centers]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=632356</guid>
		<description><![CDATA[Google spent $1.2 billion on property and equipment in the first quarter of 2013, nearly doubling last year's first quarter. <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632356&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Google has stepped up its infrastructure spending once again, to the tune of $1.2 billion in the first quarter, according to its <a href="http://investor.google.com/earnings/2013/Q1_google_earnings.html">earnings report released on Thursday</a>. That&#8217;s a 20 percent quarter-over-quarter increase, and <a href="http://gigaom.com/2013/01/22/google-spent-a-billion-on-infrastructure-last-quarter/">the previous quarter&#8217;s $1.02 billion</a> represented Google&#8217;s second-biggest quarterly investment ever.</p>
<p>There&#8217;s not much to say about this uptick in spending <a href="http://gigaom.com/2010/10/15/for-google-capex-costs-are-worth-the-money/">that hasn&#8217;t been said before</a>. Essentially, Google has to keep on spending to keep its services running as well as possible and as efficiently as possible. Competing against Amazon, Facebook, Apple and Microsoft in everything from search to mobile to cloud computing costs a boatload of cash. Rolling out Google Fiber &#8212; <a href="http://gigaom.com/2013/04/17/provo-utah-is-the-next-stop-for-google-fiber/">soon to be in three cities</a> &#8212; certainly isn&#8217;t cheap, either.</p>
<p><a href="http://gigaom.com/2012/07/27/chart-apple-facebook-spending-a-lot-on-infrastructure/">Apples-to-apples comparisons can be tough</a>, because everyone&#8217;s businesses are different and decisions to build or buy new gear can affect expenditures, <a href="http://gigaom.com/2013/04/04/apples-massive-jobs-designed-future-headquarters-project-is-2b-over-budget/">as can massive new headquarters</a>. But here goes: In its fiscal third quarter earnings announced on Thursday, Microsoft claims it spent $930 million. Facebook, Apple and Amazon have not yet released their latest earnings, although both <a href="http://www.apple.com/pr/library/2013/01/23Apple-Reports-Record-Results.html">Apple</a> and <a href="http://phx.corporate-ir.net/phoenix.zhtml?c=97664&amp;p=irol-newsArticle&amp;ID=1779040&amp;highlight=">Amazon</a> spent more than $2 billion on &#8220;property and equipment&#8221; in the previous quarter. Facebook <a href="http://investor.fb.com/releasedetail.cfm?ReleaseID=736911">spent $198 million and another $89 million</a> leasing property and equipment.</p>
<p>This quarter&#8217;s $1.2 billion also represents a nearly 2x increase over last year&#8217;s first quarter infrastructure spending for Google.</p>
		<form id="wpcom-iframe-form-fd6e1d64871c5cd7e2312ce40c5c9dd3" target="wpcom-iframe-fd6e1d64871c5cd7e2312ce40c5c9dd3" method="post" action="http://wpcomwidgets.com">
							<input type="hidden" name="frameborder" value="0" />
							<input type="hidden" name="scrolling" value="no" />
							<input type="hidden" name="resize" value="0" />
							<input type="hidden" name="replace_attributes" value="1" />
							<input type="hidden" name="fallback" value="&lt;p class=&quot;protected-embed-fallback&quot;&gt;This embed is invalid&lt;/p&gt;" />
							<input type="hidden" name="width" value="550" />
							<input type="hidden" name="height" value="636" />
							<input type="hidden" name="style" value="border:none;" />
							<input type="hidden" name="_data" value="PGlmcmFtZSBzcmM9Ii8vaW5mb2dyLmFtL0dvb2dsZS1pbmZyYXN0cnVjdHVyZS1zcGVuZGluZy1kZXJyaWNraGFycmlzXzEzNjYzMTg1OTMiIHdpZHRoPSI1NTAiIGhlaWdodD0iNjM2IiBzY3JvbGxpbmc9Im5vIiBmcmFtZWJvcmRlcj0iMCIgc3R5bGU9ImJvcmRlcjpub25lOyI+PC9pZnJhbWU+PGRpdiBzdHlsZT0id2lkdGg6NTUwcHg7Ym9yZGVyLXRvcDoxcHggc29saWQgI2FjYWNhYztwYWRkaW5nLXRvcDozcHg7Zm9udC1mYW1pbHk6QXJpYWw7Zm9udC1zaXplOjEwcHg7dGV4dC1hbGlnbjpjZW50ZXI7Ij48YSB0YXJnZXQ9Il9ibGFuayIgaHJlZj0iLy9pbmZvZ3IuYW0vR29vZ2xlLWluZnJhc3RydWN0dXJlLXNwZW5kaW5nLWRlcnJpY2toYXJyaXNfMTM2NjMxODU5MyIgc3R5bGU9ImNvbG9yOiNhY2FjYWM7dGV4dC1kZWNvcmF0aW9uOm5vbmU7Ij5Hb29nbGUgaW5mcmFzdHJ1Y3R1cmUgc3BlbmRpbmc8L2E+IHwgPGEgc3R5bGU9ImNvbG9yOiNhY2FjYWM7dGV4dC1kZWNvcmF0aW9uOm5vbmU7IiBocmVmPSIvL2luZm9nci5hbSIgdGFyZ2V0PSJfYmxhbmsiPkNyZWF0ZSBpbmZvZ3JhcGhpY3M8L2E+PC9kaXY+,fbe891011d4712c3613d98815c4c5da971fcb663" />
							<input type="hidden" name="_tag" value="protected-iframe" />
							<input type="hidden" name="_hash" value="fd6e1d64871c5cd7e2312ce40c5c9dd3" />
					</form>
		<iframe name="wpcom-iframe-fd6e1d64871c5cd7e2312ce40c5c9dd3" width="550" height="636" frameborder="0" scrolling="no" ></iframe>
		<script type="text/javascript">document.getElementById('wpcom-iframe-form-fd6e1d64871c5cd7e2312ce40c5c9dd3').submit();</script>
		
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=632356&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=507667"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=507667" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632356+googles-infrastructure-spending-spree-continues-1-2b-in-q1&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632356+googles-infrastructure-spending-spree-continues-1-2b-in-q1&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li><li><a href="http://pro.gigaom.com/report/how-the-mega-data-center-is-changing-the-hardware-and-data-center-markets/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632356+googles-infrastructure-spending-spree-continues-1-2b-in-q1&utm_content=dharrisstructure">How the mega data center is changing the hardware and data center markets</a></li><li><a href="http://pro.gigaom.com/2012/05/locating-data-centers-in-an-energy-constrained-world/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=632356+googles-infrastructure-spending-spree-continues-1-2b-in-q1&utm_content=dharrisstructure">Locating data centers in an energy-constrained world</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/04/18/googles-infrastructure-spending-spree-continues-1-2b-in-q1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/04/google-data-centet-e1366320388620.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/04/google-data-centet-e1366320388620.jpg?w=150" medium="image">
			<media:title type="html">google data center</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>The world is ready for the consumer-grade enterprise</title>
		<link>http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/</link>
		<comments>http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 14:00:06 +0000</pubDate>
		<dc:creator>Paul Maritz, Pivotal Initiative</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Paul Maritz]]></category>
		<category><![CDATA[Pivotal Initiative]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=621141</guid>
		<description><![CDATA[Former VMware CEO and current Pivotal Initiative leader Paul Maritz shares his thoughts on how the future of enterprise IT must mirror the practices of consumer web companies.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621141&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In the last three decades, we’ve seen a shift in enterprise information technology, from mainframes that automated our financial information, to the client-server and web-based world that aimed to replace most paper-based processes with “systems” like CRM, ERP, e-commerce and email. And now, in the cloud era, we find ourselves on the brink of another transformative shift. This one is driven by the explosion of data and the need for traditional enterprises to find new business value through new business models and building better customer experiences.</p>
<p>A key question becomes how this shift will become a reality and where we will look for a blueprint to begin. I think the answer, or at least the opportunity to see further, comes from “standing on the shoulders of giants.” And in this case specifically, I’m talking about the consumer internet giants like Google, Facebook and Amazon.</p>
<p>These companies have created significant new business value and blazed new trails in developing ways to manage and extract meaning from massive amounts of data. As a result, they’re able to deliver meaningful products, features, and experiences rapidly to their customers — essentially,giving customers what they want, when they want it and where they want it. Wouldn’t it be nice for traditional enterprises to have the same capabilities?</p>
<h2 id="the-traditional-enterprise-mus">The traditional enterprise must learn from internet technology</h2>
<p>Powered by new data fabrics with custom-built infrastructure, these consumer internet companies interact and serve their customers in the context of who their customers are, where they are and what they are doing in the moment. They are building, deploying and scaling at an unprecedented pace. They are storing, managing and delivering value from large data sets, and they knit all of this together on one unified platform that supports their businesses.</p>
<div id="attachment_614630" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/02/1z5o2616.jpg"><img alt="Structure 2011: Paul Maritz – CEO, VMware" src="http://gigaom2.files.wordpress.com/2013/02/1z5o2616.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-614630"></a><p class="wp-caption-text">Paul Maritz at Structure 2011<br>(c) Pinar Ozger</p></div>
<p>Now add to this mix the emergence of the “internet of things,” the fact that telemetry will become pervasive in coming years. Everything from a fridge to a jet engine will be dialing home in the future, constantly reporting its state. This will drive a new avalanche of data that will arrive in huge quantities and will need to be ingested and reacted to in real time.</p>
<div>
<h2 id="successful-enterprises-must-be">Successful enterprises must become “consumer-grade” in order to win</h2>
<p>Enterprise companies will need ways to store and analyze massive amounts of data cost-effectively, ingest huge numbers of events in real time, reason over the data and events, and react in real time. Teams will need to be able to develop rapidly the new solutions that exploit these underlying capabilities. The need for these capabilities can be seen across a wider set of industries — from industrial control to telecommunications to retail, and even to modern agriculture.</p>
<p>Addressing these opportunities will require new underpinnings; a new platform, if you like. At the core of this platform, which needs to be cloud-independent to prevent lock-in, will be new approaches to handling big and fast (real-time) data. And history teaches us that when the underlying data fabrics change, a lot else in the IT industry changes, as well.</p>
<p>“Carrier-grade” or “industrial-grade” — and yes, of course, “enterprise-grade” — once represented best-in-class products and technology while “consumer-grade” was associated with lightweight technology not fit for a professional, high-performance environment. Well, things are changing; the former lightweight is the new heavyweight. <em>Consumer-grade</em> will become the new benchmark.</p>
<p><em>Paul Maritz is the former CEO of VMware, current chief strategy officer of EMC and also holds a leadership position with the Pivotal Initiative. He will be part of a fireside chat with GigaOM’s Om Malik on Wednesday, March 20, at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=621141+the-world-is-ready-for-the-consumer-grade-enterprise&amp;utm_content=gigaguest">Structure: Data</a> in New York.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-348181p1.html">Shutterstock user Oleksiy Mark</a>.</em></p>
</div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=621141&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=421721"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=421721" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621141+the-world-is-ready-for-the-consumer-grade-enterprise&utm_content=gigaguest">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621141+the-world-is-ready-for-the-consumer-grade-enterprise&utm_content=gigaguest">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2011/10/infrastructure-q3-openstack-and-flash-step-into-the-spotlight/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621141+the-world-is-ready-for-the-consumer-grade-enterprise&utm_content=gigaguest">Infrastructure Q3: OpenStack and flash step into the spotlight</a></li><li><a href="http://pro.gigaom.com/2013/01/cloud-and-data-fourth-quarter-2012-analysis/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=621141+the-world-is-ready-for-the-consumer-grade-enterprise&utm_content=gigaguest">The fourth quarter of 2012 in cloud</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_108857858-1.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/shutterstock_108857858-1.jpg?w=150" medium="image">
			<media:title type="html">cloud servers</media:title>
		</media:content>

		<media:content url="http://1.gravatar.com/avatar/4411542bbd7a2a9a2fc2a1b38809e45c?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">gigaguest</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/1z5o2616.jpg?w=300" medium="image">
			<media:title type="html">Structure 2011: Paul Maritz – CEO, VMware</media:title>
		</media:content>
	</item>
		<item>
		<title>Microsoft&#8217;s next chapter: Putting Bing tech inside our homes and data centers</title>
		<link>http://gigaom.com/2013/03/12/microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers/</link>
		<comments>http://gigaom.com/2013/03/12/microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers/#comments</comments>
		<pubDate>Tue, 12 Mar 2013 18:33:22 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=618318</guid>
		<description><![CDATA[Microsoft is no joke when it comes to building web infrastructure and developing techniques such as machine learning. The company thinks its heavy investment in these areas will pay off big-time in the years to come.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=618318&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Two terms kept popping up as I watched a slew of Microsoft executives show off the company’s future at its annual TechForum media gathering last week. One was “machine learning.” The other was “Bing.”</p>
<p>I would have been surprised had I not sat down with Microsoft Technical Fellow Dave Campbell the night before the event to talk big data. After all, I was in Redmond — home of Word, Excel and a, shall we say, misunderstood new operating system — not Silicon Valley, where “machine learning” now rolls off the tongue as easily and often as “startup” or<a href="http://gigaom.com/2012/07/13/why-silicon-valley-is-crazy-about-adventure/"> “triathlon.”</a></p>
<p>However, a single rhetorical question from Campbell resonated pretty loudly and got me in the right frame of mind for what I was about to hear: Who else, he asked, has a top-tier web service business (complete with the hundreds of petabytes of data those services collect) as well as a top-tier enterprise software business?</p>
<p>He could have added to that list a consumer software business, 30 percent of the world’s long-distance calls, a mobile device business, one of the world’s most popular gaming platforms, <a href="http://www.microsoft.com/en-us/news/Press/2012/Jul12/07-09TouchscreenPR.aspx">a large-screen touch-display business</a>, and a motion-sensing device that ties into — and can control — all of them. They all came into play at TechForum, as various company presidents, engineers and now-adviser-to-the-CEO Craig Mundie demonstrated a future where everything is connected and trying to learn what we like and what we’re doing.</p>
<h2 id="bing-is-the-key-to-it-all-even">Bing is the key to it all (even if it can’t touch Google)</h2>
<p>Microsoft’s Bing search engine is at the core of everything the company is trying to do in the field of machine learning and cutting-edge big data. That fact makes it an important part of Microsoft’s future even if it never gets close to Google search in terms of revenue or users. “Its long-term value is just as much as a deep infrastructural element,” Mundie said during a Q&amp;A session kicking off the event.</p>
<p>What he means is that Bing is valuable because the technology developed to power it ultimately stands to make Microsoft a lot more money in other areas. Qi Lu, Microsoft’s Online Services Division president (and an integral part of <a href="http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/">the maturation of Hadoop inside Yahoo</a> earlier this century), describes Bing’s primary architecture as less of a traditional keyword index and more of an “information fabric.” We’re building a digital society, he explained, so there are digital entities — people, place and things — and Bing must be able to capture <a href="http://www.technologyreview.com/news/512306/microsofts-bing-now-can-find-local-businesses-that-arent-too-crowded/">the rich spatial, temporal and other relationships</a> among them.</p>
<div id="attachment_619574" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/20130304_145323.jpg"><img alt="A research project for analyzing viral web content." src="http://gigaom2.files.wordpress.com/2013/03/20130304_145323.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-619574"></a><p class="wp-caption-text">A research project for analyzing viral web content.</p></div>
<p>Taking that vision company-wide, Microsoft can take in data from Bing, Skype, Xbox Live, Office 365 and other sources and actually be able to store, process and analyze it in a meaningful ways. Internally, this might be for business-intelligence or product-development purposes. Externally, Microsoft might use data to create experiences that span devices and services.</p>
<p>Bing also feeds the pipeline for future enterprise IT products, particularly when it comes to data management. Campbell tells the story of meeting a colleague years after he left the SQL database team and went to work on Bing’s infrastructure. At that point, their worlds were vastly different, but the advent of and hype around big data has converged them once again.</p>
<div class="wp-caption alignright" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/01/1z5o8006.jpg"><img alt="Structure 2012: Satya Nadella - President, Server and Tools Business, Microsoft" src="http://gigaom2.files.wordpress.com/2013/01/1z5o8006.jpg?w=300&#038;h=199" width="300" height="199" class=""></a><p class="wp-caption-text">Satya Nadella at Structure 2012<br>(c) Pinar Ozger</p></div>
<p>During his presentation, Satya Nadella, Microsoft’s Server and Tools Business president, said the company now builds internal IT with a design-for-first-party-but-think-of-third-party mentality. As a result, the core of the Windows Azure cloud-computing platform is based on technologies developed to run Bing, as is the Windows Azure storage service. When Microsoft builds a new operating system, he added, it thinks about the project at webscale in terms of what it would take to run Bing using that platform.</p>
<p>And Campbell told me via email after the event that Microsoft is considering how to productize the various graph, NoSQL and other types of databases it uses to power the features within Bing. Ironically, though, its <a href="http://blogs.msdn.com/b/seliot/archive/2010/11/05/cosmos-petabytes-perfectly-processed-perfunctorily.aspx">Cosmos</a> and <a href="http://gigaom.com/2011/03/12/with-dryad-microsoft-is-trying-to-democratize-big-data/">Dryad</a> technologies that serve as the core of Bing are off the table: consumers demanded Hadoop, so <a href="http://gigaom.com/2012/02/28/microsofts-hadoop-play-is-shaping-up-and-it-includes-excel/">that’s what Microsoft is currently pushing</a> for mass storage and large-scale batch processing.</p>
<p>Google, of course, is doing something very similar, albeit with less of a focus on enterprise software as a final destination for its technologies (with the exception of its small suite of cloud services such as <a href="http://gigaom.com/2012/06/28/taking-on-amazon-google-launches-compute-on-demand-rival-to-ec2/">Compute Engine</a>, <a href="http://gigaom.com/2012/06/26/google-app-engine-what-developers-want-at-google-io/">App Engine</a> and <a href="http://gigaom.com/2012/05/01/google-opens-up-its-biq-query-data-analytics-service-to-all/">BigQuery</a>). Rather, <a href="http://gigaom.com/2012/06/25/how-google-is-teaching-computers-to-see/">the types of advances in data storage, processing and analysis</a> that Google has made thanks to products such as search and YouTube are finding their way into Project Glass and self-driving cars. Time will tell whose efforts prove wiser in the end.</p>
<h2 id="a-little-history-and-prognosti">A little history and prognostication on machine learning</h2>
<p>Mundie said machine learning, especially, has been a core part of Microsoft Research’s focus for years. And although there were some initial struggles, including a dearth of good data and machines powerful enough to process it all, the company and the industry as a whole have come a long way. Among the big areas of improvement he cited were real-time speech recognition — Microsoft <a href="http://research.microsoft.com/en-us/news/features/haitiancreole-020410.aspx">has done some impressive work in this area</a>, actually — and natural user interaction.</p>
<p>“We’ve talked for a long time in the industry about <em>IT</em> meaning <em>information technology</em>,” Mundie said, “… you might redefine<em> IT</em> to be <em>intelligent technology</em>.”</p>
<p>Eric Rudder, Mundie’s protégé and chief technical strategy officer, elaborated. If you think about all the pictures and other info Microsoft’s devices and services capture, he said, you’ll see a lot of opportunity to learn and build better products. Stepping out of the consumer world, he questioned how one might begin working with a 40-billion-row Excel spreadsheet. Query it, talk to it or somehow use gestures to communicate with it?</p>
<div id="attachment_619834" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2013/03/c71c2493.jpg"><img alt="Eric Rudder (foreground) and Craig Mundie (background). Source: Microsoft" src="http://gigaom2.files.wordpress.com/2013/03/c71c2493.jpg?w=300&#038;h=200" width="300" height="200" class="size-medium wp-image-619834"></a><p class="wp-caption-text">Eric Rudder (foreground) and Craig Mundie (background). Source: Microsoft</p></div>
<p>Mundie thinks Microsoft can answer these and other questions — this despite a relative lack of attention compared with Google’s research efforts and a consumer community he says is “jaded” by the omnipresence of high technology. TV makers are copying Kinect, speech will be the most-prevalent user interaction and cameras as inputs are coming soon, he said. And Microsoft’s machine-learning research will let it capitalize or even lead the way on these movements, he added.</p>
<p>As I’ll highlight in a follow-up post, Microsoft showed off a lot of these capabilities to the handful of journalists invited to TechForum. Kinect, Office, Xbox Live — they’re all watching, listening, learning and working together.</p>
<p>It’s part of a greater transition away from “specialized gadgets” that process information and into a world full of generally intelligent devices and services that just let people get stuff done. “The vast majority of humankind,” Mundie said, “doesn’t really care about the computer, per se.”</p>
<h2 id="have-research-division-will-pe">Have research division, will persevere</h2>
<p>In the end, Microsoft Chief Research Officer Rick Rashid expects Microsoft’s heavy investment into general research of the kind his team does will help it get the last laugh over some of its competitors. He wonders whether companies like Apple — which already saved itself once — will be ready to ride the next wave of innovation or the one after that without dedicated general research departments that aren’t necessarily tied to product development. His view is that you can only buy yourself into the next generation so many times.</p>
<div id="attachment_619572" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/20130304_144456.jpg"><img alt="A project (same as the feature image) called Adaptive Machine Learning for Real-Time Streaming." src="http://gigaom2.files.wordpress.com/2013/03/20130304_144456.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-619572"></a><p class="wp-caption-text">A project (same as the feature image) called Adaptive Machine Learning for Real-Time Streaming.</p></div>
<p>It was Microsoft Research, for example, that developed a method for compressing 32-bit code in the early 1990s — something that would prove fortuitous when it came time to ship Windows ’95 and its associated applications despite the fact that most PCs lacked the proper hardware for the 32-bit OS. In terms of establishing the dominance of Office over its peers that had to wait until the hardware caught up, Rashid told a group of reporters during the event, “that was game over.”</p>
<p>“Our industry is littered with companies that aren’t here anymore,” he added.</p>
<p>Touché. Microsoft is the butt of a lot of jokes, but as the tech world shifts toward intelligent devices and alternative mode of human-computer interaction, the company’s research into areas such as big data and machine learning suggest it will still be very much around for some time to come.</p>
<p><em>To learn a lot more about machine learning and the latest trends in big data technologies, be sure to attend our <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=618318+microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers&amp;utm_content=dharrisstructure">Structure: Data conference</a> March 20-21 in New York. Speakers will include some of the brightest minds in data from organizations such as EMC, Facebook, Cloudera, Quid and even the CIA.</em></p>
<p><a href="http://structuredata2013-editgraphic.eventbrite.com/"><img alt="Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now." src="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banner_590x1101.png?w=708"   class="aligncenter size-full wp-image-610578"></a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=618318&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=65283"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=65283" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618318+microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618318+microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li><li><a href="http://pro.gigaom.com/2012/11/unlocking-big-datas-potential-with-search/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618318+microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers&utm_content=dharrisstructure">How search can unlock the power of big data</a></li><li><a href="http://pro.gigaom.com/2012/05/the-importance-of-putting-the-u-and-i-in-visualization/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=618318+microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers&utm_content=dharrisstructure">The importance of putting the U and I in visualization</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/12/microsofts-next-chapter-putting-bing-tech-inside-our-homes-and-data-centers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/03-06adaptive_web.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/03-06adaptive_web.jpg?w=150" medium="image">
			<media:title type="html">03-06Adaptive_Web</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/20130304_145323.jpg?w=708" medium="image">
			<media:title type="html">A research project for analyzing viral web content.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/01/1z5o8006.jpg?w=300" medium="image">
			<media:title type="html">Structure 2012: Satya Nadella - President, Server and Tools Business, Microsoft</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/c71c2493.jpg?w=300" medium="image">
			<media:title type="html">Eric Rudder (foreground) and Craig Mundie (background). Source: Microsoft</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/20130304_144456.jpg?w=708" medium="image">
			<media:title type="html">A project (same as the feature image) called Adaptive Machine Learning for Real-Time Streaming.</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banner_590x1101.png" medium="image">
			<media:title type="html">Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now.</media:title>
		</media:content>
	</item>
		<item>
		<title>Facebook kisses DRAM goodbye, builds memcached for flash</title>
		<link>http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/</link>
		<comments>http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 00:13:10 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[energy efficiency]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Flash Memory]]></category>
		<category><![CDATA[Flash storage]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[web appllications]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=617081</guid>
		<description><![CDATA[Facebook has developed a new data cache called McDipper that's essentially memcached rewritten to run on flash memory instead of DRAM, thus saving money while still delivering higher performance than disk.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=617081&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Q: What do you get when you mix Facebook’s extensive memcached usage with its strategy of “cold storage” for infrequently accessed data?</p>
<p>A: McDipper, a Facebook-built implementation of the popular memcached key-value store designed to run on flash memory rather than pricier DRAM.</p>
<p><a href="http://memcached.org/">Memcached</a>, for the unfamiliar, is an open-source key-value store that caches frequently accessed data in memory so applications can access and serve it faster than if it were stored on hard disks. It’s a very popular component of many web applications stacks, including at Facebook where the company runs thousands of memcached servers to power its various applications.</p>
<p>But DRAM is expensive, especially when you get to Facebook’s scale, and not all applications deserve that kind of performance. So, <a href="https://www.facebook.com/notes/facebook-engineering/mcdipper-a-key-value-cache-for-flash-storage/10151347090423920">according to a Facebook Engineering post on Tuesday</a>, the company designed McDipper to handle “working sets that had very large footprints but moderate to low request rates. … Compared with memory, flash provides up to 20 times the capacity per server and still supports tens of thousands of operations per second.”</p>
<p>Facebook has deployed McDipper for a handful of these workloads, the blog states, and has “reduced the total number of deployed servers in some pools by as much as 90% while still delivering more than 90% of get responses with sub-millisecond latencies.” It has been part of Facebook’s photo infrastructure for about a year and serves 150 gigabits of data per second — or “about one library of congress (10 TB) every 10 minutes” — over Facebook’s content-delivery network.</p>
<div id="attachment_617132" class="wp-caption aligncenter" style="width: 718px"><img alt="mcdipper" src="http://gigaom2.files.wordpress.com/2013/03/563268_10151454322497200_149974633_n.png?w=708&#038;h=249" width="708" height="249" class="wp-image-617132"><p class="wp-caption-text">How McDipper stores data</p></div>
<p>This is the same logic that drove Facebook to <a href="http://gigaom.com/2012/10/03/facebooks-next-compute-challenge-is-cold-storage/">undertake its cold storage engineering effort</a> for even more infrequently accessed data, which aims to find a middle ground between the inefficiency and latency of hard disks and the high cost of flash storage. To meet that goal, the company is getting creative by <a href="http://gigaom.com/2013/01/16/why-facebook-might-put-blu-ray-to-use-on-big-data/">considering everything from lower-performance flash to Blu-ray</a> — pretty much anything but tape — VP of Engineering Jay Parikh told me in January.</p>
<p>Building a tool like McDipper is the just the tip of the iceberg, though, when it comes to managing the cost and efficiency of infrastructure at large web companies such as Facebook. On Tuesday, eBay <a href="http://gigaom.com/2013/03/05/ebay-shows-the-world-how-to-measure-mpg-for-data-centers/">released its Digital Service Efficiency report</a> that lays out a methodology for assessing the effect that infrastructure (more than 52,000 servers in eBay’s case; Facebook has even more) has on larger corporate goals such as clean energy and the bottom line.</p>
<p>And later this month at our <a href="http://event.gigaom.com/structuredata/schedule/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=617081+facebook-kisses-dram-goodbye-builds-memcached-for-flash&amp;utm_content=dharrisstructure">Structure: Data conference</a>, data center executives from Facebook, Microsoft and Goldman Sachs will take the stage to discuss how smart analytics help them plan to meet capacity needs while keeping costs in check.</p>
<p><em>Feature image is Facebook’s new all-flash Dragonstone server design.</em></p>
<p><a href="http://structuredata2013-editgraphic.eventbrite.com/"><img alt="Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now." src="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banner_590x1101.png?w=708"   class="aligncenter size-full wp-image-610578"></a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=617081&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=301616"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=301616" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=617081+facebook-kisses-dram-goodbye-builds-memcached-for-flash&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=617081+facebook-kisses-dram-goodbye-builds-memcached-for-flash&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li><li><a href="http://pro.gigaom.com/2012/12/how-the-mobile-first-world-will-transform-the-data-center/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=617081+facebook-kisses-dram-goodbye-builds-memcached-for-flash&utm_content=dharrisstructure">How tomorrow&#8217;s mobile-centric data centers will look</a></li><li><a href="http://pro.gigaom.com/2012/07/cloud-and-data-second-quarter-2012-analysis-and-outlook-2/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=617081+facebook-kisses-dram-goodbye-builds-memcached-for-flash&utm_content=dharrisstructure">Takeaways from the second quarter in cloud and data</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/dragonstone-e1362528412272.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/dragonstone-e1362528412272.jpg?w=150" medium="image">
			<media:title type="html">Dragonstone</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/563268_10151454322497200_149974633_n.png?w=708" medium="image">
			<media:title type="html">mcdipper</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/structure-data_in-article-banner_590x1101.png" medium="image">
			<media:title type="html">Structure:Data: Put data to work. 60+ big data experts speaking. March 20-21, 2013, New York City. Register now.</media:title>
		</media:content>
	</item>
		<item>
		<title>eBay shows the world how to measure MPG for data centers</title>
		<link>http://gigaom.com/2013/03/05/ebay-shows-the-world-how-to-measure-mpg-for-data-centers/</link>
		<comments>http://gigaom.com/2013/03/05/ebay-shows-the-world-how-to-measure-mpg-for-data-centers/#comments</comments>
		<pubDate>Tue, 05 Mar 2013 17:00:34 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Blooom Energy]]></category>
		<category><![CDATA[clean energy]]></category>
		<category><![CDATA[Data Centers]]></category>
		<category><![CDATA[ebay]]></category>
		<category><![CDATA[efficiency]]></category>
		<category><![CDATA[energy efficiency]]></category>
		<category><![CDATA[PUE]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[solar power]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=616896</guid>
		<description><![CDATA[eBay has released a trove of information about the efficiency of its data centers, and plans to do so quarterly as part of a mission to continuously track computing resources and tie them to bigger business goals.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616896&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>eBay is busy building some of the world&#8217;s most-efficient data centers, and its efforts aren&#8217;t just show. The company has figured out a way to tie its computing infrastructure to specific business concerns and plans to continuously tweak its operations to meet top-level mandates. On Tuesday, eBay released a whitepaper describing how it accomplished this and laying out a framework for companies that want to do the same.</p>
<p>Dean Nelson, eBay&#8217;s vice president of Global Foundation Services, says the effort, called the <a href="http://dse.ebay.com/">Digital Service Efficiency</a> report, &#8220;is the miles per gallon measure for technical infrastructure for eBay.&#8221; Essentially, the company has boiled its business down to a single currency &#8212; transactions (specifically URL requests) associated with users&#8217; buying and selling on the site &#8212; and created a slew of metrics that measure how efficiently it delivers those transactions in terms of revenue, performance, cost and carbon footprint.</p>
<p>The project has been about 18 months in the making, Nelson told me during a recent phone call, and eBay was finally able to set a baseline measurement of its performance in 2012. Now that it knows what&#8217;s in place and how its infrastructure performs over the course of a year, the goal in 2013 is to cut its computing-related carbon usage and costs by 10 percent and increase performance in terms of transactions per kilowatt-hour by 10 percent.</p>
<p>In order to meet these goals, he said, every member of the technical team &#8212; from facilities managers to software engineers &#8212; has be striving toward them and also be cognizant of how turning their &#8220;knobs&#8221; will affect the other metrics eBay is measuring. &#8220;Think of it like a Rubik&#8217;s cube,&#8221; Nelson explained. &#8220;You can solve one side but screw up the rest of them.&#8221;</p>
<p>eBay plans to release quarterly updates on its progress along with its earnings reports, but employees will have access to down-to-the-second visibility into what&#8217;s going on. &#8220;It makes it personal for them,&#8221; Nelson said. &#8220;They can see what their efforts mean.&#8221;</p>
<p><img  alt="Digital Service Efficiency" src="http://gigaom2.files.wordpress.com/2013/03/final_dse-dashboard.jpeg?w=708&#038;h=419" width="708" height="419" class="aligncenter size-large wp-image-616903" /></p>
<h2 id="52075-servers-doing-a-lot-of-w">52,075 servers doing a lot of work</h2>
<p>Nelson offered some pretty compelling examples of how the Digital Service Efficiency project works in practice. If the goal is to decrease cost per transactions, data center engineers might try to minimize power usage at the facility level while server engineers might look to lower-power gear or better utilization on existing gear. They essentially reduce the denominator in that equation &#8220;and the net result is we should make more money from those transactions,&#8221; he said.</p>
<p>In one real-world instance, a software engineer tweaked some code that affected how much memory an application requires and the company was able to eliminate 400 servers. That cut energy usage by 1 megawatt and a $2 million savings in capital expense when the time would have come to refresh those servers.</p>
<p>eBay also has created a &#8220;list of fame&#8221; and a &#8220;list of shame&#8221; that highlight the 1,000 best- and worst-utilized servers within the company. &#8220;We have a hit list,&#8221; Nelson said, and it&#8217;s going to examine the bottom 20 percent to figure out why they&#8217;re as wasteful as they are.</p>
<p>However, he added, it&#8217;s important to remember on the server front that improving cost, performance and carbon usage doesn&#8217;t always mean buying lower-power gear. If eBay can improve the power density of its racks using technology such as liquid cooling &#8212; something <a href="http://gigaom.com/2012/04/06/making-the-web-more-efficient-a-thousand-servers-at-a-time/">its Project Mercury data center in Phoenix is pre-equipped for</a> &#8212; it can handle more transactions on less gear. It already has some racks running at a sustained rate of 35 kilowatts and thinks it can push that up to 50 kilowatts, Nelson said.</p>
<h2 id="clean-transactions-with-solar-">Clean transactions with solar panels and Bloom boxes</h2>
<p>On the carbon front, eBay has nothing but an open field in front of it thanks to some big clean-energy projects set to go live in 2013 in its new Salt Lake City, Utah, data center called Project Topaz. For starters, <a href="http://gigaom.com/2012/10/30/what-ebays-bet-on-fuel-cells-means-for-the-modern-data-center/">it&#8217;s using Bloom Energy boxes as the primary power source</a>, which mean a slightly higher cost per transaction, but also a 13 percent reduction in carbon emissions and increased reliability (downtime costs eBay a lot of money).</p>
<p>Also, the company has finally cleared some regulatory hurdles to tie <a href="http://gigaom.com/2012/04/11/ebay-covers-utah-data-center-roof-with-solar-panels/">an on-site solar array</a> back to the grid. Because of <a href="http://gigaom.com/2012/09/26/with-data-centers-web-giants-have-great-eco-responsibility/">changes to a Utah law that eBay lobbied for</a>, it&#8217;s about to start sourcing off-site clean energy for its data centers, as well.</p>
<p>&#8220;That is a corporate priority,&#8221; Nelson said. &#8220;We want to create the cleanest commerce engine on the freakin&#8217; planet.&#8221;</p>
<h2 id="trying-to-change-an-industry">Trying to change an industry</h2>
<p>Of course, the Digital Service Efficiency methodology isn&#8217;t the only attempt by a major data center operator to show the world how efficient it is. Google <a href="http://gigaom.com/2012/03/26/whose-data-centers-are-more-efficient-facebooks-or-googles/">publishes annual Power Utilization Efficiency (PUE) ratings for its data centers</a>, and Facebook occasionally does as well. On Monday, Salesforce.com <a href="http://www.salesforce.com/assets/pdf/misc/Sustainability_Commitment.pdf">released a statement underscoring its commitment</a> to sourcing renewable energy.</p>
<p><img  alt="dse chart" src="http://gigaom2.files.wordpress.com/2013/03/dse-chart.jpg?w=708&#038;h=470" width="708" height="470" class="aligncenter size-large wp-image-616915" /></p>
<p>However, Nelson pointed out, what eBay is doing &#8212; and encouraging others to do &#8212; is more transparent in that it gives a lot more depth about operations, including the company&#8217;s server count. Even if companies don&#8217;t publish their results, tying operational efficiency to other business objectives should have a positive effect on the bottom line and the environment, regardless. Every company will have its own base currency, Nelson explained, and they&#8217;ll have to find their own metrics to measure and figure out what are the knobs that each part of the company can turn to meet goals.</p>
<p>&#8220;We all have the same challenges, the same things to solve for, but we have numerous ways to solve it,&#8221; Nelson said. &#8230;&#8221;[Their implementations] may change completely, but the point is the conversation is starting.&#8221;</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616896&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=621452"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=621452" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=616896+ebay-shows-the-world-how-to-measure-mpg-for-data-centers&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/08/the-economics-of-clean-data-center-innovation/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=616896+ebay-shows-the-world-how-to-measure-mpg-for-data-centers&utm_content=dharrisstructure">The economics of clean-data-center innovation</a></li><li><a href="http://pro.gigaom.com/2012/12/how-the-mobile-first-world-will-transform-the-data-center/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=616896+ebay-shows-the-world-how-to-measure-mpg-for-data-centers&utm_content=dharrisstructure">How tomorrow&#8217;s mobile-centric data centers will look</a></li><li><a href="http://pro.gigaom.com/2011/12/migrating-media-applications-to-the-private-cloud-best-practices-for-businesses/?utm_source=cloud&utm_medium=editorial&utm_campaign=auto3&utm_term=616896+ebay-shows-the-world-how-to-measure-mpg-for-data-centers&utm_content=dharrisstructure">Migrating media applications to the private cloud: best practices for businesses</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/05/ebay-shows-the-world-how-to-measure-mpg-for-data-centers/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/final_dse-dashboard1-e1362498647255.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/final_dse-dashboard1-e1362498647255.jpeg?w=150" medium="image">
			<media:title type="html">Digital Service Efficiency</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/final_dse-dashboard.jpeg?w=708" medium="image">
			<media:title type="html">Digital Service Efficiency</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/dse-chart.jpg?w=708" medium="image">
			<media:title type="html">dse chart</media:title>
		</media:content>
	</item>
		<item>
		<title>How and why LinkedIn is becoming an engineering powerhouse</title>
		<link>http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/</link>
		<comments>http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/#comments</comments>
		<pubDate>Sun, 03 Mar 2013 20:00:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[espresso]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Kakfa]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[real-time messaging]]></category>
		<category><![CDATA[Web Infrastructure]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=616112</guid>
		<description><![CDATA[Five years ago, LinkedIn was a shell of the technology company it is today. Here's an inside look at where it came from, what it's become and where it's going.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616112&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Most LinkedIn users know “People You May Know” as one of that site’s flagship features — an onmipresent reminder of other LinkedIn users with whom you probably want to connect. Keeping it up to date and accurate requires some heady data science and impressive engineering to keep data constantly flowing between the various LinkedIn applications. When Jay Kreps started there five years ago, this wasn’t exactly the case.</p>
<p>“I was here essentially before we had any infrastructure,” Kreps, now principal staff engineer, told me during a recent visit to LinkedIn’s Mountain View, Calif., campus. He actually came LinkedIn to do data science, thinking the company would have some of the best data around, but it turned out the company had an infrastructure problem that needed his attention instead.</p>
<p>How big? The version of People You May Know in place then was running on a single Oracle database instance — a few scripts and heuristics provided intelligence — and it took six weeks to update (longer if the update job crashed and had to restart). And that’s only if it worked. At one point, Kreps said, the system wasn’t working for six months.</p>
<p>When the scale of data began to overload the server, the answer wasn’t to add more nodes but to cut out some of the matching heuristics that required too much compute power.</p>
<p>So, instead of writing algorithms to make People You Know Know more accurate, he worked on getting LinkedIn’s Hadoop infrastructure in place and built a distributed database called <a href="http://data.linkedin.com/opensource/voldemort">Voldemort</a>.</p>
<p><img alt="tracking_high_level" src="http://gigaom2.files.wordpress.com/2013/03/tracking_high_level.png?w=300&#038;h=230" width="300" height="230" class="alignright size-medium wp-image-616287">Since then, he’s built <a href="http://data.linkedin.com/opensource/azkaban">Azkaban</a>, an open source scheduler for batch processes such as Hadoop jobs, and <a href="http://data.linkedin.com/opensource/kafka">Kafka</a>, another open source tool that Kreps called “the big data equivalent of a message broker.” At a high level, Kafka is responsible for managing the company’s real-time data and getting those hundreds of feeds to the apps that subscribe to them with minimal latency.</p>
<h2 id="espresso-anyone">Espresso, anyone?</h2>
<p>But Kreps’s work is just a fraction of the new data infrastructure that LinkedIn has built since he came on board. It’s all part of a mission to create a data environment at LinkedIn that’s as innovative as that of any other web company around, and that means the company’s applications developers and data scientists can keep building whatever products they dream up.</p>
<p>Bhaskar Ghosh, LinkedIn’s senior director of data infrastructure engineering — who’ll be part of our guru panel at <a href="http://event.gigaom.com/structuredata/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=616112+how-and-why-linkedin-is-becoming-an-engineering-powerhouse&amp;utm_content=dharrisstructure">Structure: Data on March 20-21</a> — can’t help but find his way to the whiteboard when he gets to discussing what his team has built. It’s a three-phase data architecture comprised of online, offline and nearline systems, each designed for specific workloads. The online systems handle users’ real-time interactions; offline systems, primarily Hadoop and a Teradata warehouse, handle batch processing and analytic workloads; and nearline systems handle features such as People You May Know, search and the LinkedIn social graph, which update constantly but require slightly less than online latency.</p>
<div id="attachment_616138" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/20130226_145754.jpg?w=708"><img alt="Ghosh's diagram of LinkedIn's data architecture" src="http://gigaom2.files.wordpress.com/2013/03/20130226_145754.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-616138"></a><p class="wp-caption-text">Ghosh’s diagram of LinkedIn’s data architecture</p></div>
<p>One of the most-important things the company has built is a new database system called <a href="http://data.linkedin.com/projects/espresso">Espresso</a>. Unlike Voldemort, which is an eventually consistent key-value store modeled after Amazon’s Dynamo database and used to serve certain data at high speeds, Espresso is a transactionally consistent document store that’s going to replace legacy Oracle databases across the company’s web operations. It was originally designed to provide a usability boost for LinkedIn’s InMail messaging service, and the company plans to open source Espresso later this year.</p>
<p>According to Director of Engineering Bob Schulman, Espresso came to be “because we had a problem that had to do with scaling and agility” in the mailbox feature. It needs to store lots of data and keep consistent with users’ activity. It also needs a functional search engine so users — even those with lots of messages — can find what they need in a hurry.</p>
<p>With the previous data layer in tact, he explained, the solution for developers to solve scalability and reliability issues was doing so in the application.</p>
<p>However, Principal Software Architect Shirshanka Das noted, “trying to scale [your] way out of a problem” with code isn’t necessarily a long-term strategy. “Those things tend to burn out teams and people very quickly,” he said, “and you’re never sure when you’re going to meet your next cliff.”</p>
<div id="attachment_616132" class="wp-caption aligncenter" style="width: 718px"><img alt="L to R: Kreps, Shirshanka Das, Bhaskar Ghosh, Bob Schulman" src="http://gigaom2.files.wordpress.com/2013/03/20130226_153554-e1362319622641.jpg?w=708&#038;h=471" width="708" height="471" class="size-large wp-image-616132"><p class="wp-caption-text">L to R: Kreps, Das, Ghosh and Schulman</p></div>
<p>Schulman and Das have also worked together on technologies such as <a href="http://data.linkedin.com/opensource/helix">Helix</a> — an open-source cluster management framework for distributed systems — and Databus. The latter, which has been around since 2007 and <a href="http://engineering.linkedin.com/data-replication/open-sourcing-databus-linkedins-low-latency-change-data-capture-system">the company just open sourced</a>, is a tool that pushes changes in what Das calls “source of truth” data environments like Espresso to downstream environments such as Hadoop so that everyone can ensure they’re working with the freshest data.</p>
<p>In an agile environment, Schulman said, it’s important to be able to change something without breaking something else. The alternative is to bring stuff down to make changes, he added, and “it’s never a good time to stop the world.”</p>
<p><img alt="databus-usecases" src="http://gigaom2.files.wordpress.com/2013/03/databus-usecases.jpg?w=708&#038;h=243" width="708" height="243" class="aligncenter size-large wp-image-616291"></p>
<h2 id="next-up-hadoop">Next up, Hadoop</h2>
<p>Thus far, LinkedIn’s biggest push has been in improving its nearline and online systems (“Basically, we’ve hit the ball out of the park here,” Ghosh said), so its next big push is offline — Hadoop, in particular. The company already uses Hadoop for the usual gamut of workloads — ETL, model-building, exploratory analytics and pre-computing data for nearline applications — and Ghosh wants to take it even further.</p>
<p>He laid out a multipart vision, most of which centers around tight integration between the company’s Hadoop clusters and relational database systems. Among the goals: better ETL frameworks, ad-hoc queries, alternative storage formats and an integrated metadata framework — which Ghosh calls the holy grail — that will make it easier for various analytic systems to use each other’s data. He said LinkedIn has something half-built that should be finished this year.</p>
<p>“[<a href="http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/">SQL on Hadoop</a>] is going to take two years to work,” he explained. “What do we do in the meanwhile? We cannot throw this out.”</p>
<p>Actually, the whole of LinkedIn’s data engineering efforts right now put a focus on building services that can work together easily, Das said. The Espresso API, for example, allows developers to connect a columnar storage engine and do some limited online analytics right from within the transactional database.</p>
<div id="attachment_616137" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/03/20130226_153409.jpg?w=708"><img alt="With Hadoop plans laid out" src="http://gigaom2.files.wordpress.com/2013/03/20130226_153409.jpg?w=708&#038;h=531" width="708" height="531" class="size-large wp-image-616137"></a><p class="wp-caption-text">With Hadoop plans laid out.</p></div>
<h2 id="good-infrastructure-makes-for-">Good infrastructure makes for happy data scientists</h2>
<p>Yael Garten, a senior data scientist at LinkedIn, said better infrastructure makes her job a lot easier. Like Kreps, she was drawn to LinkedIn (from her previous career doing bioinformatics research at Stanford) because the company has so much interesting data to work with, only she was fortunate enough to miss the early days of spotty infrastructure that couldn’t handle 10 million users, much less today’s more than 200 million users. To date, she said, she hasn’t come across a problem she couldn’t solve because the infrastructure couldn’t handle the scale.</p>
<p>The data science team embeds itself with the product team and they work together to either prove out product managers’ hunches or build products around data scientists’ findings. In 2013, Garten said, developers should expect infrastructure that lets them prototype applications and test ideas in near real time. And even business managers need to see analytics as close to real time as possible so they can monitor how new applications are performing.</p>
<p>And infrastructure isn’t just about making things faster, she noted: “Something things wouldn’t be possible.” She wouldn’t go into detail about what this magic piece of infrastructure is, but I’ll assume it’s the company’s top-secret distributed graph system. Ghosh was happy to go into detail about a lot things, but not that one.</p>
<h2 id="a-virtuous-hamster-wheel">A virtuous hamster wheel</h2>
<p>Neither Ghosh nor Kreps sees LinkedIn — or any leading web company, for that matter — quitting the innovation game any time soon. Partially, this is a business decision. Ghosh, for example, cites the positive impact on company culture and talent recruitment, while Kreps points out the difficult total-cost-of-ownership math when comparing paying for software licenses or hiring open source committers versus just building something internally.</p>
<p>Kreps acknowledged that the constant cycle of building new systems is “kind of a hamster wheel,” but there’s always an opportunity to do new stuff and build products with their own unique needs. Initially, for example, he envisioned two targets use cases for Hadoop but now the company has about 300 individual workloads; it went from two real-time data feeds to 650.</p>
<p>“But companies are doing this for a reason,” he said. “There is some problem this solves.”</p>
<p>Ghosh, well, he shot down the idea of relying too heavily on commercial technologies or existing open source projects almost as soon as he suggests it’s a possibility. “We think very carefully about where we should do rocket science,” he told me, before quickly adding, “[but] you don’t want to become a systems integration shop.”</p>
<p>In fact, he said, there will be a lot more development and a lot more open source activity from LinkedIn this year: “[I'm already] thinking about the next two or three big hammers.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gigaom.com&#038;blog=14960843&#038;post=616112&#038;subd=gigaom2&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=492512"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/GigaOM_RSS_300x250&#038;sz=300x250&#038;c=492512" /></a></p><p><strong>Related research and analysis from GigaOM Pro:</strong><br />Subscriber content. <a href="http://pro.gigaom.com/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616112+how-and-why-linkedin-is-becoming-an-engineering-powerhouse&utm_content=dharrisstructure">Sign up for a free trial</a>.</p><ul><li><a href="http://pro.gigaom.com/2012/03/a-near-term-outlook-for-big-data/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616112+how-and-why-linkedin-is-becoming-an-engineering-powerhouse&utm_content=dharrisstructure">A near-term outlook for big data</a></li><li><a href="http://pro.gigaom.com/2012/04/infrastructure-q1-cloud-and-big-data-woo-the-enterprise/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616112+how-and-why-linkedin-is-becoming-an-engineering-powerhouse&utm_content=dharrisstructure">Infrastructure Q1: Cloud and big data woo enterprises</a></li><li><a href="http://pro.gigaom.com/2011/03/defining-hadoop-the-players-technologies-and-challenges-of-2011/?utm_source=data&utm_medium=editorial&utm_campaign=auto3&utm_term=616112+how-and-why-linkedin-is-becoming-an-engineering-powerhouse&utm_content=dharrisstructure">Defining Hadoop: the Players, Technologies and Challenges of 2011</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/03/03/how-and-why-linkedin-is-becoming-an-engineering-powerhouse/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2013/03/20130226_153554-e1362319622641.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2013/03/20130226_153554-e1362319622641.jpg?w=150" medium="image">
			<media:title type="html">LinkedIn Crew 1</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/tracking_high_level.png?w=300" medium="image">
			<media:title type="html">tracking_high_level</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/20130226_145754.jpg?w=708" medium="image">
			<media:title type="html">Ghosh&#039;s diagram of LinkedIn&#039;s data architecture</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/20130226_153554-e1362319622641.jpg?w=708" medium="image">
			<media:title type="html">L to R: Kreps, Shirshanka Das, Bhaskar Ghosh, Bob Schulman</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/databus-usecases.jpg?w=708" medium="image">
			<media:title type="html">databus-usecases</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/03/20130226_153409.jpg?w=708" medium="image">
			<media:title type="html">With Hadoop plans laid out</media:title>
		</media:content>
	</item>
	</channel>
</rss>