I was thinking about reliability in the cloud when I saw this news item about the International Space System experiencing a close call with some space debris. The threat of the debris hitting the station forced the astronauts to hang out in their escape capsule to wait out the potential hit. Scary stuff, but then I read:
The object — about the size of a bullet, and moving 20 times as fast — passed within 3 miles (4.5 kilometers) of the station early Thursday afternoon ET, the U.S. space agency reported.
Sometimes it’s the little things that completely derail us. On land, something so small and so far away wouldn’t faze us for a minute, but in the enormous distances and hostile environment of space, that 9-millimeter chunk of metal had the potential to bring the space station to a halt. Likewise, small glitches and failures that may seem manageable in a corporate setting, have awesome power when they spread across the enormous number of users on the web. So how is cloud computing like space travel? Small problems can equal a monumental fail.
The Internet itself has proven fairly resilient to large-scale attacks or dumb mistakes, but services built in the cloud have a less-than-perfect track record. Small things like “server glitches” or internal communications issues have brought down large cloud services. But since the world is moving to cloud services as a platform for businesses, we need to focus less on the “OMG, Gmail is Down!” and more on furnishing escape pods so we have our data and can remain productive when the little things drag our clouds offline.
image of ISS courtesy of NASA