Blog

Microsoft (this time) Proves Again that Dumb Mistakes are More Likely to Take Down a Cloud

As you recall, back in December around Christmas, deleting a few files took down Amazon Web Services for a time, which in turn took down NetFlix.  This holiday mishap proved to us that a dumb mistake made by a person is more likely to crash our cloud than any natural disaster.  Now it’s Microsoft’s turn to prove this point.

Microsoft’s secure Azure Storage took a dirt-nap last weekend, and it looks like the issue was that Redmond forgot to renew a security certificate.  First reported by Microsoft on Friday at 12:44pm Pacific Time on the Windows Azure Service Dashboard, a subsequent update at 1:30pm identified a problem with SSL transactions.

As a result, Microsoft reported worldwide problems with Storage, with every sub-region reporting service degradation.  The service is now back up-and-running.

This is just another lesson learned that human error is more likely to get us than catastrophic failures.  Why?  We plan for catastrophic failures, including contingency planning, but rarely plan for dumb mistakes.

As we continue to operate public clouds into 2014 and 2015, I suspect that the number if these types of incidents will decrease.  Cloud providers will get better at internal processes, and these kinds of embarrassing mistakes should be close to non-existent…I hope.