Blog Post

S3 Outage: the Aftermath

ScreenshotAs covered both here and on our parent blog GigaOM, Amazon’s S3 storage service had a bad day yesterday. (So, by the way, did their Simple Queue System, but an outage in that service is less noticeable to most web users). How bad? Well, they missed their 99.9% SLA, and now owe 10% refunds for the month; indeed, they almost came down to the 99% level where they’d owe 25% refunds. Unfortunately, to claim a refund you need to submit your server request logs to confirm the outage, which is difficult or impossible for most web worker users. If you do navigate the credit process, let us know.

ScreenshotMeanwhile, the effects of this outage have plenty of folks discussing other storage services, either as an backup to Amazon S3 or a replacement. The leading contender appears to be Nirvanix, who offer storage at $0.25 per GB/month with uploads and downloads at $0.18 per GB – not too outrageously different from Amazon’s $0.15 per GB month with uploads at $0.10/GB and downloads starting at $0.17/GB. Watch for some smart developers to come up with middleware layers that do automatic mirroring and fallback from S3 to Nirvanix soon.

9 Responses to “S3 Outage: the Aftermath”

  1. As a backup to S3, you’re right – it could make a lot of sense… and no question that both are a better deal than a full CDN.

    JD (the pricing “JD” – not the “Nirvanix has a poor track record JD” ;-)

  2. @JD – As far as pricing goes, I think a lot of services would swallow a 70% increase if they were using it only for backup purposes when Amazon was down. Less enticing if you think about switching everything over.

    And in the grand scheme of things, it’s far, far less than you would pay a CDN like Akamai.

  3. When it’s internal you have a physical throat to choke i.e direct responsibility and authority to react to your priorities. This does however come at a fairly high cost. So a model we use is a hybrid of both: where in the case of a catastrophic failure we can guarantee 100% data integrity (e.g. a second copy of all data) and fallback to a degraded state of functionality when outages occur (e.g. allow file view but not file add etc).

  4. I’m not so sure the pricing is “not too outrageously different.” By my math, it’s an increase of 70% or so (I looked at monthly costs for 100gb storage with 20gb up and 10gb down).

    The price difference may not be worrisome for the Jungle Disk demographic (personal backups smaller than the above numbers), but would add up quick for sites/applications relying on S3 for lots of assets.

  5. Of course going all inhouse simply means that YOU are on the line for service levels. It’s always interesting to see the unspoken assertion that people make when talking about failures like this – that they’d do better. That flies in the face of experience of course… most companies do NOT have 100% uptime even when they run everything in house.

    The problem with the S3 outage was, i think, less that it was several hours than that it was several consecutive hours during the day. Had it been a series of hiccups at 3am spread out over a month there would be less comment.

  6. We used them heavily before the outage. Primarily the SQS service, but also S3 to an extent.

    We decided during the outage that it didn’t make sense for us to outsource critical services like we were to them. So during the outage, I coded up an entire replacement to SQS so that we don’t have to now. I think a lot of people are going to be doing this too.

    Either people will realize their dependencies on Amazon and will be adding a middle layer, or they will go all in-house, like us.