14 Comments

Summary:

Amazon web services are having trouble this evening and in the process are taking down some major sites. Among sites being impacted are Quora and HipChat. In addition, the Amazon outage has had an impact on Heroku, a division of Salesforce.

shutterstock_76332670

Updated: Amazon web services are having trouble this evening and in the process are taking down some major sites and services. Among sites being impacted are Quora and HipChat. In addition, the Amazon outage has had an impact on Heroku, a division of Salesforce.

Amazon is one of the key infrastructure providers to some of the biggest and many well known startups such as Pinterest and Dropbox. The outages were related to Amazon’s EC2 and RDS services and the problems it seemed were localized to Amazon’s Virginia datacenter. Other services in the North Virginia data center such as ElastiCache and Elastic Beanstalk were also impacted. The problem appears to be rooted in a power outage.

On their status website, regarding EC2 Amazon notes:

We continue to investigate this issue. We can confirm that there is both impact to volumes and instances in a single AZ in US-EAST-1 Region. We are also experiencing increased error rates and latencies on the EC2 APIs in the US-EAST-1 Region.

9:55 PM PDT We have identified the issue and are currently working to bring effected instances and volumes in the impacted Availability Zone back online. We continue to see increased API error rates and latencies in the US-East-1 Region.

On the issue of RDS problems, AWS notes:

9:33 PM PDT Some RDS DB Instances in a single AZ are currently unavailable. We are also experiencing increased error rates and latencies on the RDS APIs in the US-EAST-1 Region. We are investigating the issue.
10:05 PM PDT We have identified the issue and are currently working to bring the Availability Zone back online. At this time no Multi-AZ instances are unavailable.

00:11 AM PDT As a result of the power outage tonight in the US-EAST-1 region, some EBS volumes may have inconsistent data.
01:38 AM PDT Almost all affected EBS volumes have been brought back online. Customers should check the status of their volumes in the console. We are still seeing increased latencies and errors in registering instances with ELBs.

AWS has suffered outages in past. A widespread problem impacted major websites in April 2011. In July 2008, Amazon’s S3 service was offline and caused major problems for many of its customers. I have been in touch with folks from Amazon and Heroku to get better idea of what is going on. In the interim enjoy some of the tweets about the outage.

Image courtesy of Shutterstock user michaket.

  1. We got impacted too, outage impacted us in us east 1b, funny thing is that our paging service pagerduty also went down temporarily

    Outage lasted 15mins for us and now we are able to bring up boxes

    Share
  2. I Am OnDemand Friday, June 15, 2012

    Which major sites did suffer from this outage ? looking to see improvement in comparison to Apr-2011 outage.

    Share
    1. Heroku was down and that impacted several apps. Quora was down. Asana was down. exfm was down. Svbtle was offline as well. It was a long list of people who were set back.

      Share
    2. There’s no comparison to the EBS outage. This was an annoying glitch; something went blooey in a single AZ in US-EAST, but it was very, very minor compared what happended with EBS. That was a systemic infrastructure failure, but even then, applications that were properly designed for the AWS environment did not go down. The lesson is still to architect for failure; sites that haven’t grokked AWS’s strentghs and weaknesses at this point don’t do this don’t have much to gripe about. Not to mention, this is a vocal, engaged, but tiny part of the internet we’re talking about. It’s not like anything really bad happened.

      Share
  3. Thumb app is down :(

    Share
  4. Hadley Harris Friday, June 15, 2012

    Yeah, we’re still down due to this @Thumb!

    Share
  5. Control Group Friday, June 15, 2012

    The real issue is poorly designed and managed infrastructure. http://blog.controlgroup.com/2012/06/15/the-real-issues-with-the-aws-outage/

    Share
    1. Eamonn Colman Tuesday, June 19, 2012

      I agree. If you’re not designing for failure in your data center or cloud build outs shame on you.

      Share
  6. People http://Lunacloud.com is the solution… I recently move there and i love it.

    Share
  7. This is nuts, 3 days and the geniuses can’t fix it?

    Share
  8. Daniele Calabrese Saturday, June 16, 2012

    Soundtracker Radio was down as well. All back up and normal soon after that

    Share
  9. Amazon provided more detail about the outage early Saturday morning. Turns out startups that were deployed across multiple availability zones were not affected. Anyone have any idea what this kind of redundancy costs?

    Share
  10. Amazon has really started to annoy me these days. Apart from their immoral attempt to take over all things relating to buying and selling on the internet they now cause havoc by taking half the internet down with them. They have also recently entered the B2B market and begun to compete with sites such as Thomasnet and Daily Sales Exchange but as this news peice shows ( http://news.yahoo.com/amazon-getting-too-big-britches-160025093.html ) not all is well in the Amazon camp. Well i’m happy about that then :-)

    Share

Comments have been disabled for this post