[qi:004] T-Mobile service experienced a hiccup yesterday evening that left some 2 million users without service, and the usual rush of tweets and news stories followed the outage. On Monday night Rackspace, which provides managed hosting and cloud services, also experienced problems that took some customers out for hours. The interesting things here are the difference in how both firms handled it and how much publicity each event garnered.
As of last count, a Google search returned more than 250 articles dealing with the T-Mobile outage while there were only 11 articles on Rackspace’s outage, even though it affected popular sites like TechCrunch, Daily Booth and Posterous. The higher visibility of the T-Mobile outage is likely a result of how consumer-oriented the service is. So the first lesson is, if you are a consumer-facing service, you need to have your PR people on the blog, Twitter, newspaper circuit offering statements and remedies quickly.
The second lesson can be found in how the events were handled. When it came to handling the outage, T-Mobile issued fairly regular updates but hasn’t yet provided any insight into what happened. In an age of transparency, this doesn’t go over well. Meanwhile, Rackspace has a very detailed status page that offers actual information on what went wrong and what it’s doing to fix it. Additionally, Rackspace offered actual people for its customers to call.
Other enterprise-focused services should take a lesson from Rackspace, especially when it comes to delivering cloud services. Take Google for instance; during a July App Engine platform outage, I got several emails from people who were frustrated that Google didn’t give them much information on the problem. One of those emails (I’m keeping the person’s name out of this as he is dependent on Google to host some of his information) noted: “I think the time is appropriate for a real critique of the offering, and force them to answer publicly about what they’re doing to fix whatever architectural flaw is knocking the service down every 2 or 3 weeks.”
Google’s stoplight metric (which offers only “no issues, investigating or service disruption”), paired with intermittent, vague blog posts, isn’t likely to cut it for long. If Google wants to play in the enterprise with its cloud platforms and services, it’s going to have to have people on hand to offer real customer service (something I’m not sure Google ever does — although, when Om complained about Gmail, a media relations person did get us someone on the phone) and provide better updates.
Last week, I chatted with Steven Cakebread, the former CFO of Saleforce.com, about service level agreements for demand computing and platforms as a service for the enterprise. In our conversation, he said that, until there’s more competition, Google and Amazon won’t have a huge incentive to explain outages in detail or get people on the job of explaining and holding customer’s hands during outages. So I suppose it’s a good thing IBM, Savvis, Rackspace, Terremark and others are all crowding into this space.