Amazon S3 Storage Service Goes Down, Still Not Up
Amazon’s S3 cloud-based storage service went down earlier this morning, according to numerous tips we’ve received. The service has impacted many companies, including folks like Twitter. According to our tipsters, the service went down around 4:30 a.m., and is showing a 500 Internal Server Error message.
Amazon Web Services forums are full of people chatting about the outage. One poster on the forum summed up the situation nicely, saying, “The s3 service is great but this just proves you can’t rely on it, this is a major issue especially since it’s been down for so long. Way to go Amazon.”
This outage, one of the first large-scale problems to hit Amazon, shows that a lot of work needs to be done before we can completely rely on the cloud. As I have often said, we are running the 21st century web on infrastructure that was dreamed up in the 1990s, long before the web’s current scale. Still, that doesn’t take away my long-standing enthusiasm for Amazon’s web services strategy.
We will keep you posted. Meanwhile, let us know how you have been impacted and what you are doing to build the redundancy of your web service.
Nick Carr has his take on the situation. “Given that entire businesses run on S3 and related services, Amazon has a particularly heavy responsibility not only to fix the problem quickly but to explain it fully,” he writes. I agree with him, and hopefully Amazon will do the needful. Amazon says it is fixed it, but there seem to continuing problems with the service, as the forum indicatess.
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

Its back up now. We get most of our traffic from India and unfortunately for us, this happened during near peak hours – 6 in the evening. We use AWS for images, but the system defaults to our internal server when it fails. We had been thinking of doing away with the fail-over given how well AWS worked, but ofcourse, that wouldn’t happen anytime soon now
Some one check if Rackspace went down today or not. It appears that “downtime trouble” follows Twitter where ever they go!
@Adnan,
That is funny. I am betting that TWitter people will not admit their own shortcomings and how badly their system is architected. It is always the hosting company which is to blame.
digg here – http://digg.com/hardware/Amazon_S3_World_s_most_reliable_web_service_is_DOWN
We’ve gotten so good at reducing adoption friction, that we’ll see a lot of this kind of thing. It just isn’t possible to plan for it.
More on my blog:
http://smoothspan.wordpress.com/2008/02/15/google-reports-iphone-usage-50x-other-handsets-amazon-s3-goes-down-low-friction-has-a-cost/
Best,
BW
“…Amazon will do the needful.”
Om, you did not just use that word…needful.
I use JungleDisk to backup my iPhoto library to Amazon S3 nightly. No data was lost ( on my end ) but I did notice that JungleDisk had to backup the entire iPhoto library and not just the new files.
I’m not happy this outage happened, but we may be better off for it as an industry. There’s so much hype about the possibilities of the cloud right now that we’re overlooking some of the service-level requirements that it may or may not meet. Amazon could inadvertently become a test case that will be studied by other enterprises who are considering moving their infrastructure over.
One of our clients sites was down for a while, due to this outage. Seems to be back up. They did say that other than this, the service has been great. We are working on an upcoming project and are pretty sure we are going to use AWS…Definitely going to do more diligence on this and see what the explanation is for it. I look forward to seeing the reason.
Matt
We are only one major outage away from certain marquee clients swearing off sole reliance on SAAS. This happened to a mid-sized automotive auction, a client, that had with my help knit together a network of dealers, contractors, and agents, into a system with a zero install, zero hosting footprint.
UNTIL:
There were four accounts that were mashed up…the usual suspects, and one of them went dark. We did some pinging (here is a good business idea for a bright Web20 person, third party app monitoring and governance) and isolated the guilty party.
In spite of being punked, fingered, whatever, the slacker who ran the service were very rude and unforthcoming. That’s another problem: who are you going to deal with when these hosted services go down? I’m not so sure if it was SalesForce that crapped out, that it would have been better.
Long and short of it: we have a business community that is used to local control, we consultants want to deliver apps as a service – we will need to ally ourselves with the providers of these services to come up with a game plan…but try and get one of the stars to cough up a retainer!
Most of the startup SaaS guys laugh when I propose a contract to consult on packaging and policies for reliability for the SMB end users.
But this is exactly what they should want, guys like me who bea the bushes for them.