As I write this, Amazon’s S3 storage servers have been unreachable for 90 minutes. As was the case back in February the last time this happened, the outage is apparent by chunks of Web 2.0 dropping off: the most visible indication of trouble for many people was the sudden vanishing of pictures from Twitter. Poke around some, though, and you can find plenty of other services that are gasping for air right now. That’s not counting those of us who use S3 personally, for things like backups via Jungle Disk or the equivalent.
Amazon learned from the last outage that transparency is a must. If you visit the Service Health Dashboard, you can see that they know about the outage and are “pursuing corrective action” – though they have not yet announced an ETA for a fix. At least we know they’re working on it, though that’s cold comfort for startups who built their business around S3. Fortunately Amazon’s EC2 cloud is unaffected, so we’re not seeing swathes of servers vanish from the net.
Amazon does offer an SLA for the S3 service, guaranteeing 99.9% uptime or part of your money back. With .1% of a month being around 45 minutes, that means they owe people money. The requirements for claiming a refund, though, are onerous enough that no one except large users will bother (hey, Amazon, how about an automatic refund when you know your servers are down?).
With two relatively serious outages in the space of 6 months, some will be asking the question of why depend on S3? The answer is simple: the rates are hard to beat, especially for service that doesn’t require any sysadmin budget. The fact remains that no other giant has started offering commodity storage at similarly attractive prices, though it’s obvious that Google or even Microsoft could get into the game if they wanted to.
As long as Amazon has a virtual monopoly of cheap, distributed, fast, API-accessible storage, web startups will continue to depend on them. If you need to offer your customers better than 99.9% uptime for some reason, though, it’s clear that you need to have a backup plan for those times when the Amazon infrastructure is having issues. Fortunately, most of us can live without uptime over that level.