4 Comments

Summary:

When we covered last weekend’s Amazon S3 outage, we wondered about the level of transparency and suggested that the requirements for claiming a refund were too onerous. In a good display of customer service, the Amazon team has addressed both of these issues. First, they’ve posted […]

ScreenshotWhen we covered last weekend’s Amazon S3 outage, we wondered about the level of transparency and suggested that the requirements for claiming a refund were too onerous. In a good display of customer service, the Amazon team has addressed both of these issues. First, they’ve posted details about what they’re calling an “availability event” in a form that can serve as a model postmortem for Web 2.0 outages: they give technical details and a timeline on the problem and the fix, enumerate the changes they’ve made to prevent it happening again, and end on a personal note (though take half a point off for omitting to use the word “sorry” anywhere).

Even better news is buried in the S3 forums: you won’t have to jump through those refund-claiming hoops after all. “For this particular event, we’ll be waiving our standard SLA process and applying the appropriate service credit to all affected customers for the July billing period. Customers will not need to send us an e-mail to request their credits, as these will be automatically applied. This transaction will be reflected in our customers’ August billing statements.”

  1. Matt Hussein Platte Saturday, July 26, 2008

    Why should they be sorry about anything? More to the point, why do you think this is a situation that requires an apology? Early on you were beating up on Amazon because of their cynical fill-out-the-forms attitude regarding the refund. Now that you’ve discovered your error WRT the refund, it seems that the apology might best be turned around.

    Share
  2. @Matt: If you check the times involved, you’ll see that our coverage of the refund process was posted before they made the policy exception for this outage.

    As for the apology, when a service with a 99.9% SLA misses badly, I think it’s only good form to tell your customers that you’re sorry about the downtime. YMMV, I guess.

    Share
  3. I am a user of S3 and I have many years experience with problem and crisis management in large data centers. Amazon have posted a model report which explains what happened, looks for root causes and identifies actions in place to tackle them. I agree that saying sorry would have been good, but I think this was posted by engineers focused on the problem, rather than customer-facing people.

    Share
  4. I don’t agree that a “sorry” was necessary. They outlined in very specific detail as to what happened and how they are going to try (operative word) to avoid it in the future. That is what I would expect of a business that I depend on at this level.

    As for the details being “technical” and not “customer-facing”, this is what we need. The issue was not a business issue, but a very technical one. This explanation shows that they figured it out and solved it. The post was not meant to be a business post.

    I have experienced similar types of outages from our hosting provider (where we have 8 servers that operate 24/7 for our 15,000 clients), so I know how important an outage is. Amazon’s explanation was spot-on, necessary and accepted. No need to say “sorry”, but more importantly, “here is how we are going to try to avoid this in the future”.

    Share

Comments have been disabled for this post