<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:go='http://ns.gigaom.com/'
xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Can Today&#039;s Hardware Handle the Cloud?</title>
	<atom:link href="http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/feed/" rel="self" type="application/rss+xml" />
	<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/</link>
	<description></description>
	<lastBuildDate>Fri, 10 Feb 2012 23:57:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: SpringSource Buys Startup to Scale Messaging in the Cloud</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205810</link>
		<dc:creator><![CDATA[SpringSource Buys Startup to Scale Messaging in the Cloud]]></dc:creator>
		<pubDate>Wed, 11 Aug 2010 00:07:15 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205810</guid>
		<description><![CDATA[&lt;p&gt;[...] backed by major banks, Cisco and a handful of smaller companies. As hardware is virtualized, translating some of the network equipment like load balancers into software allow services running on the virtualized hardware to scale better. Hopefully we&#8217;ll learn [...]&lt;/p&gt;]]></description>
		<content:encoded><![CDATA[<p>[...] backed by major banks, Cisco and a handful of smaller companies. As hardware is virtualized, translating some of the network equipment like load balancers into software allow services running on the virtualized hardware to scale better. Hopefully we&#8217;ll learn [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Will Microsoft Tempt Enterprises Up To the Cloud? - GigaOM</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205809</link>
		<dc:creator><![CDATA[Will Microsoft Tempt Enterprises Up To the Cloud? - GigaOM]]></dc:creator>
		<pubDate>Tue, 28 Oct 2008 20:28:59 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205809</guid>
		<description><![CDATA[[...] very sensitive data, and all of them said their clients would balk at cloud storage until they get a closer look at the security and reliability of the architecture. In the U.S. there are, at the very least, regulatory hurdles around storing sensitive data [...]]]></description>
		<content:encoded><![CDATA[<p>[...] very sensitive data, and all of them said their clients would balk at cloud storage until they get a closer look at the security and reliability of the architecture. In the U.S. there are, at the very least, regulatory hurdles around storing sensitive data [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scale Fail : Beyond Search</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205808</link>
		<dc:creator><![CDATA[Scale Fail : Beyond Search]]></dc:creator>
		<pubDate>Mon, 21 Jul 2008 13:11:12 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205808</guid>
		<description><![CDATA[[...] shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous [...]]]></description>
		<content:encoded><![CDATA[<p>[...] shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: S3 Outage Highlights Fragility of Web Services - GigaOM</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205807</link>
		<dc:creator><![CDATA[S3 Outage Highlights Fragility of Web Services - GigaOM]]></dc:creator>
		<pubDate>Mon, 21 Jul 2008 03:10:46 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205807</guid>
		<description><![CDATA[[...] shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous [...]]]></description>
		<content:encoded><![CDATA[<p>[...] shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Craig Balding</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205806</link>
		<dc:creator><![CDATA[Craig Balding]]></dc:creator>
		<pubDate>Thu, 03 Jul 2008 08:30:33 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205806</guid>
		<description><![CDATA[Alistair

My take on this is similar to that of Fazal.  Amazon support API level integrity checks in the form of MD5 to detect potential corruption at time of transfer.  It was the people using MD5 that picked up this issue - for others their data was getting silently corrupted (!).

As S3 is primarily targeted at developers, this incident demonstrates the need for greater awareness around Cloud Storage integrity API options and limitations.

I&#039;ve posted on the issue here:
http://cloudsecurity.org/2008/06/25/a-question-of-integrity-to-md5-or-not-to-md5/

Thanks,
Craig]]></description>
		<content:encoded><![CDATA[<p>Alistair</p>
<p>My take on this is similar to that of Fazal.  Amazon support API level integrity checks in the form of MD5 to detect potential corruption at time of transfer.  It was the people using MD5 that picked up this issue &#8211; for others their data was getting silently corrupted (!).</p>
<p>As S3 is primarily targeted at developers, this incident demonstrates the need for greater awareness around Cloud Storage integrity API options and limitations.</p>
<p>I&#8217;ve posted on the issue here:<br />
<a href="http://cloudsecurity.org/2008/06/25/a-question-of-integrity-to-md5-or-not-to-md5/" rel="nofollow">http://cloudsecurity.org/2008/06/25/a-question-of-integrity-to-md5-or-not-to-md5/</a></p>
<p>Thanks,<br />
Craig</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Ulevitch</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205805</link>
		<dc:creator><![CDATA[David Ulevitch]]></dc:creator>
		<pubDate>Fri, 27 Jun 2008 21:52:01 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205805</guid>
		<description><![CDATA[1) The network is designed to scale.  A broken loadbalancer doesn&#039;t indicate anything.
2) The customers in the clouds are buying Arastra switches, the exact same hardware base Google is using (Google just using their own software).  All these chips are made by Fulcrum Microsystems.  They are doing *very* well.
3) This is nothing new.  It is not a threat to Cisco or Juniper.  Cisco will eventually just buy Arastra.  This happens every couple years.]]></description>
		<content:encoded><![CDATA[<p>1) The network is designed to scale.  A broken loadbalancer doesn&#8217;t indicate anything.<br />
2) The customers in the clouds are buying Arastra switches, the exact same hardware base Google is using (Google just using their own software).  All these chips are made by Fulcrum Microsystems.  They are doing *very* well.<br />
3) This is nothing new.  It is not a threat to Cisco or Juniper.  Cisco will eventually just buy Arastra.  This happens every couple years.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fazal Majid</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205804</link>
		<dc:creator><![CDATA[Fazal Majid]]></dc:creator>
		<pubDate>Fri, 27 Jun 2008 18:59:26 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205804</guid>
		<description><![CDATA[I think Amazon handled this one quite well, in fact.

The error was introduced in the load balancer in the part where it copies data from one incoming SSL connection to an outgoing (presumably non-SSL) connection. This could be due to any number of reasons, including defective memory or firmware code bugs. Since the corruption was introduced between connections, the TCP checksum mechanisms on either connection wouldn&#039;t have caught it, which is why there is still a need for end-to-end checksums. Amazon&#039;s API does actually have such a mechanism, it was just not required or enforced inconsistently. I am sure they will fix that in the next release.

The conclusion I draw from this incident is the opposite of yours - hardware is much more reliable than software, but you can&#039;t trust it entirely either. Simply replacing the load-balancer with another from a different brand will not eliminate the vulnerability, and thus it is not a solution. Cost or manageability, or defect rates would be good reasons for Amazon to switch load-balancer suppliers.

HP ProCurve has an interesting pitch - all their Ethernet switches now have a programmable CPU core per port, and they offer SDKs to partners to implement custom advanced functionality in those. A ProCurve switch costs an order of magnitude less than a F5 or Netscaler, and the per port ASICs should be perfectly up to the task of load-balancing and simple firewalling (if not SSL acceleration, which is much more computationally intensive). A company like Amazon or Google could save a lot of money by entering the ProCurve partner program and writing their own custom cloud-oriented logic that does exactly what they need and not one bit more, to reduce costs, complexity and the likelihood of bugs as the previous poster noted.]]></description>
		<content:encoded><![CDATA[<p>I think Amazon handled this one quite well, in fact.</p>
<p>The error was introduced in the load balancer in the part where it copies data from one incoming SSL connection to an outgoing (presumably non-SSL) connection. This could be due to any number of reasons, including defective memory or firmware code bugs. Since the corruption was introduced between connections, the TCP checksum mechanisms on either connection wouldn&#8217;t have caught it, which is why there is still a need for end-to-end checksums. Amazon&#8217;s API does actually have such a mechanism, it was just not required or enforced inconsistently. I am sure they will fix that in the next release.</p>
<p>The conclusion I draw from this incident is the opposite of yours &#8211; hardware is much more reliable than software, but you can&#8217;t trust it entirely either. Simply replacing the load-balancer with another from a different brand will not eliminate the vulnerability, and thus it is not a solution. Cost or manageability, or defect rates would be good reasons for Amazon to switch load-balancer suppliers.</p>
<p>HP ProCurve has an interesting pitch &#8211; all their Ethernet switches now have a programmable CPU core per port, and they offer SDKs to partners to implement custom advanced functionality in those. A ProCurve switch costs an order of magnitude less than a F5 or Netscaler, and the per port ASICs should be perfectly up to the task of load-balancing and simple firewalling (if not SSL acceleration, which is much more computationally intensive). A company like Amazon or Google could save a lot of money by entering the ProCurve partner program and writing their own custom cloud-oriented logic that does exactly what they need and not one bit more, to reduce costs, complexity and the likelihood of bugs as the previous poster noted.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew</title>
		<link>http://gigaom.com/2008/06/27/storage-outages-can-todays-hardware-handle-the-cloud/#comment-205803</link>
		<dc:creator><![CDATA[Andrew]]></dc:creator>
		<pubDate>Fri, 27 Jun 2008 16:43:12 +0000</pubDate>
		<guid isPermaLink="false">http://gigaom.com/?p=13910#comment-205803</guid>
		<description><![CDATA[A big part of the problem is a lack of experienced individuals when it comes to operating large-scale clusters. To use the Amazon S3 example above, best practices for extremely large hardware clusters is to use end-to-end software checksums to catch the occasional hardware failure that will allow corruption to get past hardware CRC and checksum systems.  It looks like they learned that lesson the hard way, but they should not feel too bad because Google did too.  If you put enough silicon in a room, you can no longer count on its internal error correction mechanism and software checks need to be instituted, something originally discovered by the supercomputing community that still has not penetrated the broader developer space.

I will generally agree, though, that a lot of network gear is poorly designed for large-scale distributed systems, either having inexpensive silicon that is too under-powered architecturally for clusters and max-load usage (for the cost conscious markets) or being &quot;carrier class&quot; networking gear with performant silicon but also a ton of other features glued on that are useless for distributed cluster applications and which drive the price way up.  It is just a matter of time before one of the networking gear companies starts producing switch engines specifically designed for large-scale cluster applications if they haven&#039;t already.]]></description>
		<content:encoded><![CDATA[<p>A big part of the problem is a lack of experienced individuals when it comes to operating large-scale clusters. To use the Amazon S3 example above, best practices for extremely large hardware clusters is to use end-to-end software checksums to catch the occasional hardware failure that will allow corruption to get past hardware CRC and checksum systems.  It looks like they learned that lesson the hard way, but they should not feel too bad because Google did too.  If you put enough silicon in a room, you can no longer count on its internal error correction mechanism and software checks need to be instituted, something originally discovered by the supercomputing community that still has not penetrated the broader developer space.</p>
<p>I will generally agree, though, that a lot of network gear is poorly designed for large-scale distributed systems, either having inexpensive silicon that is too under-powered architecturally for clusters and max-load usage (for the cost conscious markets) or being &#8220;carrier class&#8221; networking gear with performant silicon but also a ton of other features glued on that are useless for distributed cluster applications and which drive the price way up.  It is just a matter of time before one of the networking gear companies starts producing switch engines specifically designed for large-scale cluster applications if they haven&#8217;t already.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

