16 Comments

Summary:

[qi:90] Skype’s Heartbeat Blog has an explanation for the 30-hour outage that plagued the eBay-owned (EBAY) voice company last week. A quick overview: 1. Microsoft issued Windows updates on Thursday, Aug. 16th. 2. Millions installed those patches, rebooted, and tried to log into the Skype network […]

[qi:90] Skype’s Heartbeat Blog has an explanation for the 30-hour outage that plagued the eBay-owned (EBAY) voice company last week. A quick overview:

1. Microsoft issued Windows updates on Thursday, Aug. 16th.
2. Millions installed those patches, rebooted, and tried to log into the Skype network — pretty much all at the same time.
3. Combined with a lack of P2P resources, the flood of log-in requests put the Skype network under extreme stress.
4. This, in turn, exposed an unseen software bug “within the network resource allocation algorithm which prevented the self-healing function from working quickly.”


OK, it sounds credible — but do you buy it? Skype Journal has some questions, namely if the bug’s fix has been propagated. What, they ask, is preventing this from happening again? After all, Microsoft (MSFT) routinely issues patches. Borough Turner, chief technology officer of NSM Communications, alludes to this in his most recent post.

Experts have pointed out that Skype generates a lot of traffic between log-in servers and supernodes. Maybe the supernodes went down during the “patches” as well. Someone who seems to be familiar with the Skype network architecture left a comment earlier that explains this relationship between 50-odd authentication servers and supernodes and also a weak link.” (Full explanation is here.)

  1. I was reading the first lines with “Collected Explanations, Courtesy of Skype” and laughing because I was thinking, “yeah, right. So are you really buying into this is a good question. This instance here is not. Thanks for all the news here and sound analysis.

    Share
  2. A couple of comments:

    1. Microsoft actually issues the patches on Patch Tuesday (late in the evening), the 2nd Tuesday of the month. So they went out late on Aug. 14. (Wednesday morning I found two of my WinXP PC’s had been rebooted after the update.)

    2. The Windows Update procedure, for those who have provided the appropriate permissions, automatically updates users’ PC’s around the world; if necesary, as was the case here, the procedure ends with an automatic PC reboot. My experience is that the process on individual PC’s takes about two days to reach everybody who has registered for the auto Windows update (a procedure which one should follow for security reasons).

    At some point late Wednesday or early Thursday the August Update uploaded to too many PC’s concurrently for the Skype infrastructure to be able to log back into Skype.

    While they have provided this information a little more on what they have done to address the “lack of peer-to-peer network resources” mentioned in their statement would help PR-wise.

    A word of ominous caution: the next Microsoft Patch Tuesday occurs on 9/11.

    Share
  3. “within the network resource allocation algorithm which prevented the self-healing function from working quickly.”

    wow, these guys sound like Bear Sterns explaining how their quant fund lost an f*** load of investor money.

    Share
  4. window is right to upgrade its s/w — and helps to open – one of the scandalous secrete of SkyPe/ eBay , where they use user’s computer as one of the network node.

    Share
  5. [...] Om Malik About Skype outage explanation@GigaOm [...]

    Share
  6. Sorry, but I dont get it. Granted I dont know the numbers (so please correct me!).

    How many of these millions of users are paying for Skype vs. free service?

    Complaining because their free service was unavailable for 30 hrs… Sheesh.

    Give it a rest…

    Share
  7. So it’s Microsoft’s fault that everyone rebooted because of patches and tried to log back in? I guess, if you live in Cracksmokin’ville.

    Capacity planning choked on this. This is a Skype issue, not a Microsoft issue. I have no love for Microsoft, but let’s solve the issue – buggy Skype code.

    Everyone seems to be blaming Microsoft or George Bush on this one. I wish I knew why…

    Share
  8. I guess they weren’t doing enough unit testing

    share your startup stories
    http://startupflames.com

    Share
  9. Pure CYA.

    This type of “blame someone else” runs through Ebay corporate. I’m sure they’re “directing” (soft hand of course) the PR for Skype and “suggesting” how to differ responsibility.

    Pfft. Me bitter? Nah…

    Share
  10. Skype accepted that final responsibility laid upon their shoulders – I don’t think they are blame shifting.

    Share
  11. [...] ka Skype problēmas ir atrisinātas un arī it kā izskaidrotas, kā raksta pazīstamais blogeris Oms Maliks (Om Malik). Latvijas blogos, tāpat kā citur, norādīts ka Skype ir vairāki bezmaksas un [...]

    Share
  12. [...] by Om Malik Tuesday, August 21, 2007 at 7:12 AM PT | No comments After no one bought their first explanation, Skype is trying one more time, this time elaborating on the Microsoft connection. Some [...]

    Share
  13. [...] Users rebooting after Windows Updates killed Skype. [...]

    Share
  14. [...] the market.This is not the first time Skype systems came under pressure because of faulty bugs. In August 2007, Skype had software problems as well, which in turn caused a flood of log-in requests and crashed the [...]

    Share

Comments have been disabled for this post