11 Comments

Summary:

Amazon Web Services has put the outage behind them and now are getting back to normal, according to the latest status update from Amazon Web Services on its Service Health Dashboard. Performance data from Cedexis shows what it meant in terms of network latency.

EC2-60min

Updated: Amazon Web Services  has put the outage behind them and now are getting back to normal, according to the latest status update from Amazon Web Services on its Service Health Dashboard. The outage that affected a hundreds of applications running in the provider’s US-EAST region is almost resolved, more than 24 hours after a “networking event” took down a number of popular services, including EC2, Elastic Block Storage and Relational Database Service.

Here’s what AWS had to say this morning:

8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they’re recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.

Here’s what the outage looked like according to Cedexis, a company that measures response times across hundreds of networks worldwide. The first graph gives an idea of normal Amazon EC2 response times over the last 30 days, but notice the US-EAST region start to spike at April 20. The second graph shows network performance over the first 24 hours of the outage, while the second and third graphs give the last 24 hours and the last 60 minutes, respectively.

30-day average (worldwide)

First 24 hours of outage (worldwide)

Last 24 hours (worldwide)

Last 60 minutes (worldwide)

Update: According to Cedexis, at least, it looks like the US-EAST region is back and performing normally, as of about 11 a.m. PST:


  1. Uh, you must be looking at a different dashboard than I am… the one I’m looking at as of 12:30PM CDT 4/22 still shows issues, and some of the sites I use which I know use the service are still out…

    Share
    1. Derrick Harris Friday, April 22, 2011

      I’m going based on the updates at http://status.aws.amazon.com/. As of 8:49 PST, it said “Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours,” and that’s just within the one now-affected zone.

      Share
    2. Yup, 2:20 central time and our properties are down hard along with many others I know. All the blogs and news writing about it as if it’s over need to look again. Here’s a site some guy on Twitter put together websites that were reporting being down, looks to me like most are still down as I’m posting this: http://ec2disabled.com/

      Share
  2. Is there any way we can put a dollar amount to the outage?

    Share
  3. Sorry guys – you’re totally wrong.

    The outage is not behind them; the Amazon status site was just updated with “we’re trying to figure out a bottleneck”

    So far 40+ hours of outage and counting.

    Share
  4. 24+ hours downtime? Unacceptable and unbelievable it happened on Amazon.

    Share
  5. For all customers affected by EC2 downtime, I would like to recommend ElasticHosts as an alternative cloud service (www.elastichosts.com) – we offer a 5 day free trial for our cloud servers in US or UK, which is likely enough at least to bridge the gap.

    Share
  6. Ditto what other commenters are stating. Amazon is still experiencing problems as of Sunday morning, 10:45am EDT with many customer sites still down since 7:22am EDT on 4/21, more than 3 full days of operation. It’s amazing that the press/media following the story are acting if everything has been resolved.

    Share
  7. Derrick Harris Sunday, April 24, 2011

    With the status updates being about the only source of info from the company, I went with what I had as of Friday when this post was published. It certainly sounded like the majority of customer issues were resolved and that it was just a matter of hours until the rest were, but anecdotal evidence suggests that wasn’t the case.

    Still, though, without knowing exact numbers regarding how bad things were initially versus how bad they are now, it’s difficult to know how much progress there actually has been.

    Share
  8. Cloud SLA’s are as good as the least reliable layer in the stack

    Share
  9. It’s highly unexpected to see an outage from one of the biggest online stores of the world. They should have back ups to recover a troubled system right away…PS3 network is down for the last 3 days as well…What’s going on with the big players?

    Share

Comments have been disabled for this post