24 Comments

Summary:

It has been a day from hell for Skype fans and Skype, the company. The outages have impacted many. Skype’s misfortune turned out to be a boon for SIPphone. The company saw a 400% increase in traffic this morning, with 4 times increase in sales, calls […]

It has been a day from hell for Skype fans and Skype, the company. The outages have impacted many. Skype’s misfortune turned out to be a boon for SIPphone.

The company saw a 400% increase in traffic this morning, with 4 times increase in sales, calls and downloads of its Gizmo Project software. “It is interesting to see that voice callers are transitory,” Michael Robertson, founder, SIPphone wrote in an email.

Meanwhile on the Skype outage front, we spoke to a Skype spokesperson and she said that the crew is working hard to get the service back by August 17th. Skype Journal is keeping tabs and says tht about 2.5 million Skypers are back online.

Skype spokeswoman also clarified the problem was not with either the Microsoft updates or with the Skype P2P architecture.

The Skype system has not crashed or been victim of a cyber attack. We love our customers too much to let that happen. This problem occurred because of a deficiency in an algorithm within Skype networking software. This controls the interaction between the user’s own Skype client and the rest of the Skype network.

Meanwhile, my sources say that one of the reasons it is taking so long for the service to come back is the Skype might be trying to restore the services from the most recent version of its database. We will keep you posted. you can also check the Skype blog for latest updates.

You’re subscribed! If you like, you can update your settings

  1. Bye bye OnState.

    http://www.on-state.com/

    How can you run a call center with 18 hour outages? Ouch.

    Well you get what you pay for.

  2. “The Skype system has not crashed or been victim of a cyber attack. We love our customers too much to let that happen.”

    umm…not sure what loving the customer has to do with whether or not someone launches an attack against the system.

    coincidentally: http://seclists.org/fulldisclosure/2007/Aug/0323.html

  3. Number of Skype Authentication servers:

    Count == 50; // Clustered

    Number of potential Skype clients:

    Count = 220,000,000 // Mostly decentralized

    Number of SuperNode clients to maintain network connectivity:

    Count = N / 300 at any one time.

    •   If there are 3.0 million users online then the ratio is 3,000,000 / 300 = 10,000  == Supernodes available
    •   Supernodes are bootstraps into the network for normal first run clients ("and handle routing of children calls").
    •   Supernodes maintain the network overlay via a DHT("Distributed Has Table") "type" method. // This is normally very slow and done over UDP
    •   If a client cannot find a Supernode, regardless of authentication via central server then is NOT allowed on the Skype network.
    

    Lack of Supernodes mean lack of network connectivity regardless of successful login via “central server”.

    You CAN be a Supernode but not have full network connectivity because you have only a portion of the “Distributed Index Data aka DHT”.

    MOST people that become Supernodes will bail out if they cannot keep a clear route (“aka calls bail out, client restarts and aborts Supernode status, thus booting it’s 300 – 500 Children and putting them into a “Connecting mode”.

    Children that are trying to “Connect” are unable to do anything unless they have a “Supernode” as a parent. // No calls, No IM….

    The overview of this is as follows:

    Skype introduced a flaw into the network that dealt with “routing” and “fucked” the “decentralized data store aka DHT” this in turn ran clients on a RANDOM search of Supernodes which at this point were well booted off of the network.

    In the End:
    It is a huge cycle, no matter how many bugs they “fix” in the “central servers” it will take many days for N nodes to become Supernodes so they can route X data from peer A to peer B. This is NOT minor, a fix to the centralized server code base to relay data to N Supernodes there is lack there of, resulting of a very segregate network. Right now there are approximatly 10,000 sub Skype networks instead of 1 Single “in sync” network. When this “data store(see DHT) is in sync globally then the Skype network will be again STABLE.

    I know this is very broad but, unless magically all of said nodes can recreate the “single overlay (DHT)” then nothing will be in sync. You will see delayed messaged, delayed or incorrect profiles and presence.

    My take, in the end is give it 48 more hours and it may be semi-stable, but hey this is what you get with using end users as your own redundancy…

    Yours…

  4. blogdoch.net — jetzt wird zurückgeblogt Friday, August 17, 2007

    No VoIP today?

    (Testing.) Entwarnung: Mein (IAX2- und SIP-basiertes-) VoIP tut, wie erwartet, wie geschnitten Brot, sogar vom N95 aus ;)

  5. Julian Cain , That was an awesome explanation!

  6. Skype’s outage: A warning sign?–TechBizMedia Friday, August 17, 2007

    [...] says that significant number of users could use it now. But I still couldn’t log into it yet. Om Malik says Skype’s loss is Gizmo Project’s [...]

  7. sippedoutyoda Friday, August 17, 2007

    That was Skype…

    Now is the time for Damaka (www.damaka.com)

    Pure SIP based P2P application that encrypts signaling and media end to end so, all your conversations are fully secured end to end.

    It does encrypted video conference, audio conference, video mail, whiteboarding, audio streaming, desktop sharing, SMS, voice mail, IM chat, video profile, dial in, dial out, free pc-to-pc (audio & video) calling anywhere in the world, cheap pc to phone, phone to phone, secure file transfer, application sharing, even mobile (pocket pc and smart phones) all this is done end to end securely without using any kind of servers in the middle.

    Check ‘em out at http://www.damaka.com/consumers/

  8. @ Julian – great explanation – so good I posted it on my blog :)..

  9. It would be funny if the Skype network cannot rebuild itself from scratch at its current scale. Perhaps the stability of a week ago was only achieved by its gradual growth trajectory over preceding years.

  10. Hmmm…I’d have to say I’ll be considering looking for a backup provider, just in case Skype fails again.

  11. Julian Cain also worked for Kazaa and Sharman Networks for a short while as their Mac developer he now works for Pando so he definitely knows how the system works .

    If I remember this was also a Issue when Sharman locked unofficial clients out a few years back .

  12. askbusinesscoach Friday, August 17, 2007

    Just when we all are looking to dump land lines. You
    know those old reliable copper/fiber things that worked on 9/11 when nothing else did. I guess the question here is how do we really get to redundancy & security perfection?

  13. The next question is “What about Joost?”. Derived technology with derived flaws. Lets see how long Joost goes without a worldwide outage. I am currently reviewing the Joost network architecture and will soon release a more in depth article of the relationships between Kazaa(fastrack), Skype and Joost.

  14. Clear Blue Dei ~~~~~~~ Web 2.0 and a little more » Blog Archive » Skype – It’s not the Outage, It’s How They Handled It. Friday, August 17, 2007

    [...] to GigaOm’s post on Skype Groans and SIPhone Gains: “The company saw a 400% increase in traffic this morning, with 4 times increase in sales, [...]

  15. All,

    While I am monitoring the Skype network “heal” itself the results are not near perfect. The “Distributed Data Store(see DHT)” is way behind. I am still receiving messages that are 24 hours late and my contact list(s) are not in “sync”. This so called “resilient network” is starting to make shape. If you are not “in the know” then please see my other writing about this outage. Here is what really happened:

    •   Skype employees introduced code into the "login/connectivty" server farm that was not compatible with current Skype clients "see Morpheus getting booted from (fasttrack).
    •   This is a single point of failure even in the masses of 9,000,00 concurrent users.
    •   This is a single point of failure for ~220,000,000 users.
    

    Question:
    • Why did resolution take so long and is still ongoing?

    Answer:
    • Skype uses a Peer to Peer topology. This consists of Supernodes which maintain a “DHT” type layer between other Supernodes. This data is routed down to their Children (300 – 500 at any one time).
    • The “DHT Type” layer is responsible for presence, avatars(icons) and above all “Call Routing”.
    • Child nodes do not know of this such layer thus they depend on Supernodes.
    • If a Supernode goes offline then all Children are cut off until they find another Supernode.
    Avoidance:
    • If child nodes knew about the upper layer at any point then this would be more resilient to outage because they are not dependent on Supernodes.
    • Local Discovery: If Skype had a layer that made local availabilty to all users then in an office environment Skypers would be able to locate each other without the need of a Supernode. (“this excludes Bonjour as it does not relay local node cache”).
    • Login failures.. This is pure redundancy(“central servers”) however Skype had this in place, but it failed because they input flawed code that was not assuring complete SSL based authentication.
    • Skype Employees need to come clean and stop blaming this on “very old” bugs, test before you release, surely there is a test bed???
    End:
    • Skype is currently re-creating it’s decentralized network from scratch, from the end user to the Supernode users to the entire “distributed index”. This is done over UDP and takes some time so do not trust the following until the network is again stable:
    • Prescence
    • Profiles, this includes avatars.
    • Contact list, yes you may see people that you blocked, give it time to heal.
    • Stability, connections may continue to drop because your Supernode went offline and your local cache is no “doing it’s job”.

    Summary:
    • Decentrilzed networks like Skype are subject to global outage the same as Kazaa and possibly Joost.

    Take Care

    ~Julian Cain

    jolix (at) mac (dot) com

  16. Skype oficiálně Monday, August 20, 2007

    [...] zajímavý komentář Julian Caina It is a huge cycle, no matter how many bugs they “fix” in the “central [...]

  17. GigaOM Skype Tells Us What Happened « Monday, August 20, 2007

    [...] this relationship between 50-odd authentication servers and supernodes and also a weak link. (Full explanation is here.) Share This | Sphere | Topic: Voice [...]

  18. Il blackout di Skype, la colpa è degli aggiornamenti Microsoft? | VoipBlog.it Monday, August 20, 2007

    [...] momento per così tanti utenti? Un’altra possibilità, come suggerito da un esperto nei commenti ad un post su Gigaom, è che ad abbandonare simultanamente la rete Skype, oltre ai PC di molti [...]

  19. Scalable web architectures » Blog Archive » How Skype network handles scalability.. Monday, August 20, 2007

    [...] One of the best technical description of the problem (which might be a speculation as well) is here. Regardless of whether this is the real cause or not, I found it interesting because it describes [...]

  20. A P2P Failure for the History Books! « Tuesday, August 21, 2007

    [...] goes to Om Malik and Julian Cain for providing insight into the technical issues that crushed Skype on August 16. Skype uses Distributed Hash Table (DHT) [...]

  21. On the internet, everyone can hear you scream at Matt Croydon::Postneo Tuesday, August 21, 2007

    [...] have one and only one chance to get it right. You’re never more than one power outage, one service outage, one information breach, bad decision, misstep, misquote, or mess up away from loosing your [...]

  22. Skype Groans & SIPphone Gains

    It has been a day from hell for Skype fans and Skype, the company. The outages have impacted many. Skype’ s misfortune turned out to be a boon for SIPphone. The company saw a 400% increase in traffic this morning,…

  23. poquer texas Tuesday, April 8, 2008

    spielregeln von poker http://blog.360.yahoo.com/blog-Ak6FUdMgfLPvlX3MIhrhSuXfYDRvmg–?p=117 [url=http://blog.360.yahoo.com/blog-Ak6FUdMgfLPvlX3MIhrhSuXfYDRvmg--?p=117]spielregeln von poker[/url] [url=http://blog.360.yahoo.com/blog-V9UWjMcic6f7vhBreQ.jYDU-?p=48]advance cash loan payday[/url] advance cash loan payday today http://blog.360.yahoo.com/blog-V9UWjMcic6f7vhBreQ.jYDU-?p=48 http://blog.360.yahoo.com/blog-GeOXMMYjfqNFVFnf4pRNQu1ZWwEi?p=196 poker bonus code no deposit [url=http://blog.360.yahoo.com/blog-GeOXMMYjfqNFVFnf4pRNQu1ZWwEi?p=196]poker bonus code no deposit[/url] play free online slots http://blog.360.yahoo.com/blog-63K8xecpKvZfmTqYcqxPY22iQThe0g–?p=31 [url=http://blog.360.yahoo.com/blog-63K8xecpKvZfmTqYcqxPY22iQThe0g--?p=31]play free online slots[/url] online poker betrug http://blog.360.yahoo.com/blog-Lwo_81czaaeuD2HHEeEKy0FqKg–?p=49 [url=http://blog.360.yahoo.com/blog-Lwo_81czaaeuD2HHEeEKy0FqKg--?p=49]online poker betrug[/url]

Comments have been disabled for this post