47 Comments

Summary:

Amazon’s S3 cloud-based storage service went down earlier this morning, according to numerous tips we’ve received. The service has impacted many companies, including folks like Twitter. According to our tipsters, the service went down around 4:30 a.m., and is showing a 500 Internal Server Error message. […]

Amazon’s S3 cloud-based storage service went down earlier this morning, according to numerous tips we’ve received. The service has impacted many companies, including folks like Twitter. According to our tipsters, the service went down around 4:30 a.m., and is showing a 500 Internal Server Error message.

Amazon Web Services forums are full of people chatting about the outage. One poster on the forum summed up the situation nicely, saying, “The s3 service is great but this just proves you can’t rely on it, this is a major issue especially since it’s been down for so long. Way to go Amazon.”

This outage, one of the first large-scale problems to hit Amazon, shows that a lot of work needs to be done before we can completely rely on the cloud. As I have often said, we are running the 21st century web on infrastructure that was dreamed up in the 1990s, long before the web’s current scale. Still, that doesn’t take away my long-standing enthusiasm for Amazon’s web services strategy.

We will keep you posted. Meanwhile, let us know how you have  been impacted and what you are doing to build the redundancy of your web service.

Nick Carr has his take on the situation. “Given that entire businesses run on S3 and related services, Amazon has a particularly heavy responsibility not only to fix the problem quickly but to explain it fully,” he writes. I agree with him, and hopefully Amazon will do the needful. Amazon says it is fixed it, but there seem to continuing problems with the service, as the forum indicatess.

  1. Its back up now. We get most of our traffic from India and unfortunately for us, this happened during near peak hours – 6 in the evening. We use AWS for images, but the system defaults to our internal server when it fails. We had been thinking of doing away with the fail-over given how well AWS worked, but ofcourse, that wouldn’t happen anytime soon now

    Share
  2. Some one check if Rackspace went down today or not. It appears that “downtime trouble” follows Twitter where ever they go!

    Share
  3. @Adnan,

    That is funny. I am betting that TWitter people will not admit their own shortcomings and how badly their system is architected. It is always the hosting company which is to blame.

    Share
  4. [...] A few days ago it was the RIM network that suddenly went down, cutting people off from their emails and other BlackBerry goodness (which some saw as a good thing rather than a catastrophe) — and this morning it was Amazon’s S3 network that suddenly went offline. The network provides cheap remote storage for dozens of Web startups, including Twitter, as well as some larger companies. What users of those services wound up with for several hours was a host of 404 and other errors. [...]

    Share
  5. [...] Simple Storage Service S3 suffered a “massive” outage this morning, impacting a number of businesses that rely on the cloud-based storage service (must be a Fractus cloud-based service). Twitter, [...]

    Share
  6. We’ve gotten so good at reducing adoption friction, that we’ll see a lot of this kind of thing. It just isn’t possible to plan for it.

    More on my blog:

    http://smoothspan.wordpress.com/2008/02/15/google-reports-iphone-usage-50x-other-handsets-amazon-s3-goes-down-low-friction-has-a-cost/

    Best,

    BW

    Share
  7. “…Amazon will do the needful.”

    Om, you did not just use that word…needful.

    Share
  8. I use JungleDisk to backup my iPhoto library to Amazon S3 nightly. No data was lost ( on my end ) but I did notice that JungleDisk had to backup the entire iPhoto library and not just the new files.

    Share
  9. [...] Coverage on CenterNetworks, Silicon Alley Insider, BlueBlog, TechCrunch, GigaOM, Rough Type and Between the Lines. addthis_url = ‘http%3A%2F%2Ftechnozzle.com%2F%3Fp%3D108′; [...]

    Share
  10. I’m not happy this outage happened, but we may be better off for it as an industry. There’s so much hype about the possibilities of the cloud right now that we’re overlooking some of the service-level requirements that it may or may not meet. Amazon could inadvertently become a test case that will be studied by other enterprises who are considering moving their infrastructure over.

    Share
  11. One of our clients sites was down for a while, due to this outage. Seems to be back up. They did say that other than this, the service has been great. We are working on an upcoming project and are pretty sure we are going to use AWS…Definitely going to do more diligence on this and see what the explanation is for it. I look forward to seeing the reason.

    Matt

    Share
  12. We are only one major outage away from certain marquee clients swearing off sole reliance on SAAS. This happened to a mid-sized automotive auction, a client, that had with my help knit together a network of dealers, contractors, and agents, into a system with a zero install, zero hosting footprint.

    UNTIL:

    There were four accounts that were mashed up…the usual suspects, and one of them went dark. We did some pinging (here is a good business idea for a bright Web20 person, third party app monitoring and governance) and isolated the guilty party.

    In spite of being punked, fingered, whatever, the slacker who ran the service were very rude and unforthcoming. That’s another problem: who are you going to deal with when these hosted services go down? I’m not so sure if it was SalesForce that crapped out, that it would have been better.

    Long and short of it: we have a business community that is used to local control, we consultants want to deliver apps as a service – we will need to ally ourselves with the providers of these services to come up with a game plan…but try and get one of the stars to cough up a retainer!

    Most of the startup SaaS guys laugh when I propose a contract to consult on packaging and policies for reliability for the SMB end users.

    But this is exactly what they should want, guys like me who bea the bushes for them.

    Share
  13. Amazon’s SLA for S3 is 99.9% uptime during a billing month. That’s 0.723 hours of allowable downtime.

    See the “Justin Etheredge Offers Preview of LINQ to [Amazon] SimpleDB” topic of http://oakleafblog.blogspot.com/2008/02/linq-and-entity-framework-posts-for_11.html.

    –rj

    Share
  14. Cloud based storage is getting alot of heat today, and since its web centric any amount of downtime is unacceptable. The situation today should not put cloud storage in a bad state, other companies such as Nirvanix have storage nodes around the world with no single point of failure with helps in avoiding situations like today.If your relying on a single point for critical data you’ve got a major problem.

    Share
  15. I also advised the auto auction that they should invest in the VSAT data services that only charge for rent of the equipment, and any fail-over data transmission, but they balked at the cost.

    I told them no matter how reliable (and generally, hosted services are more reliable than a mid-sized businesses owned plant)one local loop for data was no way to run a business. They ran their auction, live, cashier functions and all, on SAAS.

    Eventually, their link did go down, and it had nothing to do with the SAAS providers. Now, they have bonded SDSL from two carriers that can split when one goes down.

    So many ways to fail.

    Share
  16. [...] Well that exploded. Everyone seems to be covering this story now. No Comments Leave a Commenttrackback addressYou must log in to [...]

    Share
  17. [...] very next day Amazon’s S3 cloud based storage service had some significant outages. Bear in mind that S3 is a service that SaaS vendors use to outsource their storage needs rather [...]

    Share
  18. Sure – it’s a bummer when a cloud based storage system fails. In the same way that it’s awful when the power goes out. But claims that this sort of outage will harm the ascendancy of cloud computing are akin to claims that power cuts make more likely a return to gaslights and steam powered manufacturing.

    Share
  19. Alexander Sicular Friday, February 15, 2008

    Wow. Just wow. Everyone out there jumping up and down just needs to relax. Go outside, call your mother, step away from the computer, go to the gym, read a book (and not on kindle). I’m prompted to write this in light of the recent Blackberry outage. Again, a few hour outage gets coverage all over the web and on tv as well. I couldn’t believe the bb outage was covered in depth on cnbc.

    Frankly we’re all lucky this stuff even works at all. Go hug your kids or the person to your left.

    Share
  20. In our early beta version, we are using some of Amazon’s web services (namely S3 and SimpleDB) – but have been considering using our own storage and database instead. With today’s outage I’m not sure if AWS is a great strategy for us.

    We don’t have huge amounts of data to store like some companies (smugmug comes to mind), so using AWS was mostly for the peace of mind that we would be able to scale quickly after our beta goes public and all of Digg’s users abandon them for us . We have a meeting tomorrow to take a closer look at our strategy for handling lots of new traffic in a short period of time, and I have to say that it doesn’t seem likely that Amazon will be included in the party.

    Share
  21. [...] starting work on a set (shh you didn’t hear that), I check my feeds and find out that Amazon S3 croaked today, it’s a good thing snagg isn’t launched yet – most of the videos and [...]

    Share
  22. [...] Amazon S3 Storage Service Goes Down, Still Not Up [...]

    Share
  23. [...] ready for a large scale migration over to it? I don’t think so, judging by a story on GigaOM -Amazon S3 Storage Service Goes Down, Still Not Up. “…This outage, one of the first large-scale problems to hit Amazon, shows that a lot [...]

    Share
  24. Despite the few hours downtime, it’s still one of the best available and reliable web services, to date…

    Share
  25. Hmmm, I have back episodes of my podcast stored on S3, so this is a disservice to potential new subscribers. I hope Amazon fixes this soon!

    http://soundsgoodpodcast.com

    Share
  26. If you ever made an effort to read slides from SmugMug’s chief Don MacAskill (he removed PDF from site, so you can only get it from web.archive.org here http://web.archive.org/web/20070406174427/http://blogs.smugmug.com/don/files/ETech-SmugMug-Amazon-2007.pdf or same link shorter http://tinyurl.com/33t27f ), it starts from nice photo in Amazon data center after major fire. And then goes further here and there, stating that author’s company does NOT count on Amazon’s 100% reliability and does NOT advice to do same to others.

    Share
  27. [...] sexta o Amazon S3, o mais bem-sucedido dos serviços de “computação na nuvem”, teve seu primeiro crash sério. Diversas empresas, particularmente startups, foram afetadas, já que dependem totalmente do S3 [...]

    Share
  28. @A.T.

    Actually, if you bothered to read my slides, let alone my blog posts and other coverage, you’d know that:

    • That fire wasn’t from Amazon at all and isn’t related.
    • I trust Amazon with 100% of our data. More than 90% of our data lives at Amazon and no-where else.

    What I did say is that no service, hardware, or software we’ve ever used is 100% and that Amazon is no different. Depend on it, fine. I do. But expect miracles? That’s just stupid.

    Sorry about the slides being missing, that was an accident. They’ve been restored.

    Share
  29. We’ve gotten so good at reducing adoption friction, that we’ll see a lot of this kind of thing. It just isn’t possible to plan for it.

    Share
  30. It affected me a little bit – one of my subcontractors relies on AWS for file hosting, and so it was a temporary problem for me.

    That said, everything goes down. It is incumbent on you to not rely on one service, period. You wouldn’t rely on one spindle of a hard drive; you’d backup. Having multiple options is not only prudent but required, especially when using third-parties as everything will fail at some point and nothing, nothing is going to be 100% uptime, even internal systems you own completely yourself. That’s a very false sense of security.

    I like AWS and still would recommend it. Now, if this becomes a habit, then, maybe that might change.

    Share
  31. [...] I was running down the cause of Friday’s S3 service interruption, I came across this excellent financial analysis of the cost savings of a web services deployment [...]

    Share
  32. [...] Amazon S3 Storage Service Goes Down, Still Not Up [...]

    Share
  33. [...] Web Services (AWS), despite a recent outage, is the current poster child for this model as it provides a variety of services, among them the [...]

    Share
  34. [...] Web Services (AWS), despite a recent outage, is the current poster child for this model as it provides a variety of services, among them the [...]

    Share
  35. [...] Web Services (AWS), despite a recent outage, is the current poster child for this model as it provides a variety of services, among them the [...]

    Share
  36. [...] just improve availability and control for Amazon’s customers. They also make EC2 and S3 more appealing for enterprises. Interested in web infrastructure? Check out our upcoming [...]

    Share
  37. [...] cloud’s outages. In the last few months, there have been significant cloud-service problems at Amazon, HP, and [...]

    Share
  38. [...] if you want to read about last time it went down you can here on GigaOm. Possibly related posts: (automatically generated)Yahoo, Now Offering Search as a Web ServiceAmazon [...]

    Share
  39. [...] הפעם השניה השנה שהשרות אינו זמין ויחד איתו יורדים שרותים רבים אחרים [...]

    Share
  40. [...] this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers [...]

    Share
  41. [...] this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers [...]

    Share
  42. [...] this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers [...]

    Share
  43. [...] Disaster recovery no longer just means preparing for your own business failures: with cloud computing, it means preparing for the failures of your cloud vendor too.  No cloud vendor is too big to experience problems: check out the Amazon S3 outage in July 2008 and the Amazon S3 outage in February 2008. [...]

    Share
  44. [...] both February and July of this past year, many of Amazon’s S3 users were knocked offline, not just for minutes, [...]

    Share
  45. [...] Last year in July, the S3 service went offline causing widespread problems. It also suffered an outage in February 2008. But by and large, Amazon has had a good record with its services as far as uptime is [...]

    Share

Comments have been disabled for this post