32 Comments

Summary:

If you’ve ever been curious about what would happen when a cloud service fails, then you don’t have to wonder any longer. Earlier today, customers of T-Mobile and Sidekick data services provider Danger, a subsidiary of Microsoft, lost access to all their data. Some believe that […]

If you’ve ever been curious about what would happen when a cloud service fails, then you don’t have to wonder any longer. Earlier today, customers of T-Mobile and Sidekick data services provider Danger, a subsidiary of Microsoft, lost access to all their data. Some believe that this data wipeout is because of a botched upgrade. Why it happened matters little to those who are unlikely to get their data back, according to a note posted on T-Mobile forums.

Regrettably, based on Microsoft/Danger’s latest recovery assessment of their systems, we must now inform you that personal information stored on your device — such as contacts, calendar entries, to-do lists or photos — that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger. That said, our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low.

Danger’s service works in a very simple fashion. The devices are in constant communication with a server which does everything from checking email to fetching web pages and maintaining contact with all the folks we know on instant-messaging networks. It also keeps copies of other communications (such as text messages), address books and calendars. It stores photos on its servers as well. In short, what we have is a device that is a combination of a cell phone and an almost dumb terminal.

This wipeout reminds me of “The Bourne Identity,” where Jason Bourne spends the entire length of the book trying to find out his real identity because his memory has been wiped out due to an accident. T-Mobile and Danger have done something like that. By losing the servers, what they have done is the equivalent of wiping out the collective memory of its customers.

T-Mobile is advising customers not to reset their device by removing their batteries or letting their batteries drain out, because if that happens, then all the information that is local to the device is going to be wiped out as well.

This development highlights the many risks we face as we romp into our cloud-centric future. And it’s just one of the many setbacks we have faced in recent months. The Google Mail outage seems like a bad dream compared with this nightmare. After all, Google didn’t actually lose our emails. But in this case, many may have no option to go back to square one — and start over.

  1. Jacqueline Lawson Saturday, October 10, 2009

    This is awful news, especially for T-Mobile customers. From reading the article my thought is that the Microsoft ‘Danger’ upgrade caused the problem. And, I am wondering is this more of a human error than a configuration error of cloud computing? Are those one in the same? Was someone not watching the board? I would agree that moving forward into cloud computing there will have to be checks and balances and a flawless upgrade process if we are to avoid such major catastrophes! Where was the backup/restore system?

  2. bryce mcdonnell Saturday, October 10, 2009

    Seems like they should have backed up that data somewhere. I see that Mozy is a sponsor of this page …

    1. Sponsored ads must change as the page is refreshed. I see no Mozy ad on my page, just (mostly) Verizon. Hmm …

  3. Backups, backups, backups.

    I don’t know how stupid a company whose business is information can be to not do this.

  4. Agreed. How could they not have multiple redundant backups. Isn’t that Rule #1?

    1. Gottathink Abouththisone Bob Morris Sunday, October 11, 2009

      It’s more than a small surprise that the data wasn’t backed up – much more than a small surprise. Since this is such an obvious thing to think about, perhaps Microsoft had some type of reason not to backup up (recoverably). So what’s a good reason? Well, the data is highly dynamic and it is dynamic 24/7 so there’s not really a good fixed time to do a mass backup. Proper RAID storage doesn’t really cost THAT much anymore; you can have terabytes of data reliably “protected” for a few thousand bucks – hope THAT isn’t the reason. Backing up individual accounts might be doable on-the-fly but I’d expect a pretty amazing processing hit or backlog somewhere if that was done. Up there in the article it mentions an “upgrade” to the software…hmmm, what if the upgrade, on start, initializes disks? RAID doesn’t help much if you’ve written terabytes of zeroes to the RAID disks.

      Hope Microsoft explains this one…

      1. It seems like they could do a backup at a certain hour of the night for each users timezone. Of course somebody might be using the phone, but loosing 24 hours or less of modified data is whole lot different than loosing everything.

      2. There is simply no excuse for not doing backups.

        If you have a system design that makes it hard to do design (and almost any system can be backed up with the right technology), change it. Many enterprises have critical systems that are used 24/7 yet still manage to back them up – the technology of snapshots and hot database backups is very well proven.

      3. It doesn’t matter if data is highly dynamic. There are solutions for that. As users of home computers, we imagine that we have to copy everything and wait until the operation has finished, because that’s what we do at home. The enterprise world knows more advanced methods. You can even make a backup of terabytes of constantly changing data in less than a millisecond. When data changes afterwards, you make a modified copy of the block instead of changing the block itself. You need special filesystems for that (e.g. NetApp does this). This method combined with redundant copies is pretty safe.

  5. I’ll take a contrarian view. How big a user base does Danger have? And how much a nuisance is/was Danger to Windows Mobile? This is a brilliant way to intentionally destroy an acquisition and brand. I thought Adobe was good at this, but this is pure genius. Yeah, that’s the ticket, we had this -ahem- “accident”.

    1. Yacko

      it is big enough base for Microsoft to pay them $500 million + for the company. there are a few hundred thousands as customers at the very least.

      Now why would they buy-and-destroy that brand? What is your logic here? Not sure I quite understand that.

      1. What is the logic for Adobe to absorb and destroy Macromedia? There is a large part of the illustration market that worships Freehand. And this is just one of many Adobe examples. Removing the competition – priceless. Anyway, people should have known it was not safe doing business with a company named Danger, Inc, though I have to say that 20+ years ago, Dr. Evil Laboratories was a benefit to the Commodore 64 community.

  6. I guess they forgot velocity is not a persistence medium…

  7. Markus Goebel Sunday, October 11, 2009

    Gigaom turns very cloudy lately. ;)

    1. Just like San Francisco today. :-)

  8. This illustrates an even greater problem with keeping personal info on someone else’s location.

    I would appreciate being able to learn an easy method for transferring copies of my mail to my hard drive or other media. Too often I have seen an error message (of sorts) that the “mail cannot be retrieved at this time.” I can save them by printing to a pdf, but that requires a lot of time and effort. What I would appreciate is an option to “Save to a local place.”

    1. Setup Thunderbird or Outlook or something on your computer and you will then have all your emails locally too.

    2. Assuming you have some sort of webmail account, then Josh’s comment below is accurate. However, ensure the setting on your local email client (e.g., Outlook or Thunderbird) that indicates “Leave mail on Server” is checked. You don’t want to substitute one single point of failure with another. This way your email is left on the webmail account and is also brought into your local system, which of course, you backup every night.

  9. “When Cloud Fails: T-Mobile, Microsoft Lose Sidekick Customer Data”

    CORRECTION: “When WINDOWS SERVER 2003 Fails: T-Mobile, Microsoft Lose Sidekick Customer Data

    1. Todd

      I don’t think Danger servers are running off Windows Server 2003. I might be wrong, but Sidekick folks had a full Java-backend. Let me ping some folks and find out what backend systems were they really using.

      1. Regardless of which Microsoft product is in use – the catastrophic loss of data was not the fault of cloud computing.

        This post’s “When Cloud Fails” headline is just irresponsible.

        I respectfully ask this post be edited, swapping out the words “cloud computing” for the specific Microsoft product name as soon as its known.

  10. Why are application design and operational issues being called “cloud failures.” More like boneheaded “failure to use cloud computing in a way that protects your data” – which has a long history of occurring no matter how you “compute”. This is a human fail, not a tech fail.

  11. Regardless of what they’re running – heads should roll for not having a DR strategy.

  12. This is terrible.

    Up until today I’ve been a big supporter of Cloud computing eventually replacing not only the way that we store our personal information but also our media.

    There is no excuse for this.

    1. I think the lesson here is the same as that has been learned from big disasters in architecture and engineering disciplines: Learn from your mistakes by understanding why it happened and never let it happen again.

      I will say one thing though – traditionally those disciplines have had the maturity to share their learning’s from their mistakes for the collective good. Sadly in IT we don’t have that. The mistake and the career of the person making it get buried waiting for it to happen again to someone else.

      The cloud paradigm makes long term sense as it’s the correct evolution of technology. We just need to bring maturity and the willingness to the industry to share what went wrong.

      S.

  13. First, in defense of Om’s headline, there are two ways “cloud” is used in talking about network storage/computing: 1) the actual hardware/software solution facilitating and 2) the concept of data/computing done remotely. I’m not a big fan of the latter usage as it obfuscates (to me anyway… and to some of you) the former. But leave the tech journalists to their jargon ;-) So in the 2) context, the headline translates to “the user is screwed if all of their data is hosted out on the intarwubs and then lost”.

    Regarding data backups, the question had to have come up during the risk assessment. Perhaps the costs to have two copies of that data were very high. We are talking about highly trained experienced professionals here. (I lol’d in RL) Also, we don’t know that its related to Microsoft technology.

    Could be this was yet another symptom of the (allegedly) imploding MS Pink project:

    http://www.electronista.com/articles/09/10/09/ms.pink.and.danger.team.at.risk/

    1. Ok, if we are just not going to be accountable, then this should be this post’s headline:

      MICROSOFT’S AZURE DOOMED BASED ON SIDEKICK DATA LOSS

  14. Bad history for cloud technology, and backups should have been a basic, no-brainer. Shameful.

    Even as a consumer, I would have kept my own backups, if possible (on a microSD or pc sync).

  15. Well cmon! Let’s start with the name. DANGER?! Yeah that instills confidence.

    Have you heard GM has a new line of cars called The Roller!

    #whatsinanameindeed!

  16. Angry Customer Tuesday, October 13, 2009

    Forum.sidekickfail.com has recently been created as an open and neutral place sidekick customers can exchange ideas and vent without the fear of their valuable thoughts, ideas, and opinions being deleted and disrespected as T-Mobile has been doing on their forums.

    1. The actual site address is http://sidekickfail.com for the home page and http://forum.sidekickfail.com for the forum.

Comments have been disabled for this post