Skype Groans & SIPphone Gains

Om Malik, Thursday, August 16, 2007 at 10:18 PM PT Comments (24)

It has been a day from hell for Skype fans and Skype, the company. The outages have impacted many. Skype’s misfortune turned out to be a boon for SIPphone.

The company saw a 400% increase in traffic this morning, with 4 times increase in sales, calls and downloads of its Gizmo Project software. “It is interesting to see that voice callers are transitory,” Michael Robertson, founder, SIPphone wrote in an email.

Meanwhile on the Skype outage front, we spoke to a Skype spokesperson and she said that the crew is working hard to get the service back by August 17th. Skype Journal is keeping tabs and says tht about 2.5 million Skypers are back online.

Skype spokeswoman also clarified the problem was not with either the Microsoft updates or with the Skype P2P architecture.

The Skype system has not crashed or been victim of a cyber attack. We love our customers too much to let that happen. This problem occurred because of a deficiency in an algorithm within Skype networking software. This controls the interaction between the user’s own Skype client and the rest of the Skype network.

Meanwhile, my sources say that one of the reasons it is taking so long for the service to come back is the Skype might be trying to restore the services from the most recent version of its database. We will keep you posted. you can also check the Skype blog for latest updates.

Rating: 47% Thumbs Up Thumbs Down
Print

24 comments so far

August 16th, 2007
11:18 PM PT
Glen said:

Bye bye OnState.

http://www.on-state.com/

How can you run a call center with 18 hour outages? Ouch.

Well you get what you pay for.

August 17th, 2007
3:58 AM PT
jason said:

“The Skype system has not crashed or been victim of a cyber attack. We love our customers too much to let that happen.”

umm…not sure what loving the customer has to do with whether or not someone launches an attack against the system.

coincidentally: http://seclists.org/fulldisclosure/2007/Aug/0323.html

August 17th, 2007
4:13 AM PT
Julian Cain said:

Number of Skype Authentication servers:

Count == 50; // Clustered

Number of potential Skype clients:

Count = 220,000,000 // Mostly decentralized

Number of SuperNode clients to maintain network connectivity:

Count = N / 300 at any one time.

•   If there are 3.0 million users online then the ratio is 3,000,000 / 300 = 10,000  == Supernodes available
•   Supernodes are bootstraps into the network for normal first run clients ("and handle routing of children calls").
•   Supernodes maintain the network overlay via a DHT("Distributed Has Table") "type" method. // This is normally very slow and done over UDP
•   If a client cannot find a Supernode, regardless of authentication via central server then is NOT allowed on the Skype network.

Lack of Supernodes mean lack of network connectivity regardless of successful login via “central server”.

You CAN be a Supernode but not have full network connectivity because you have only a portion of the “Distributed Index Data aka DHT”.

MOST people that become Supernodes will bail out if they cannot keep a clear route (”aka calls bail out, client restarts and aborts Supernode status, thus booting it’s 300 - 500 Children and putting them into a “Connecting mode”.

Children that are trying to “Connect” are unable to do anything unless they have a “Supernode” as a parent. // No calls, No IM….

The overview of this is as follows:

Skype introduced a flaw into the network that dealt with “routing” and “fucked” the “decentralized data store aka DHT” this in turn ran clients on a RANDOM search of Supernodes which at this point were well booted off of the network.

In the End:
It is a huge cycle, no matter how many bugs they “fix” in the “central servers” it will take many days for N nodes to become Supernodes so they can route X data from peer A to peer B. This is NOT minor, a fix to the centralized server code base to relay data to N Supernodes there is lack there of, resulting of a very segregate network. Right now there are approximatly 10,000 sub Skype networks instead of 1 Single “in sync” network. When this “data store(see DHT) is in sync globally then the Skype network will be again STABLE.

I know this is very broad but, unless magically all of said nodes can recreate the “single overlay (DHT)” then nothing will be in sync. You will see delayed messaged, delayed or incorrect profiles and presence.

My take, in the end is give it 48 more hours and it may be semi-stable, but hey this is what you get with using end users as your own redundancy…

Yours…

August 17th, 2007
4:28 AM PT

No VoIP today?

(Testing.) Entwarnung: Mein (IAX2- und SIP-basiertes-) VoIP tut, wie erwartet, wie geschnitten Brot, sogar vom N95 aus ;)

August 17th, 2007
7:20 AM PT
jccodez said:

Julian Cain , That was an awesome explanation!

August 17th, 2007
8:58 AM PT

[...] says that significant number of users could use it now. But I still couldn’t log into it yet. Om Malik says Skype’s loss is Gizmo Project’s [...]

August 17th, 2007
9:46 AM PT
sippedoutyoda said:

That was Skype…

Now is the time for Damaka (www.damaka.com)

Pure SIP based P2P application that encrypts signaling and media end to end so, all your conversations are fully secured end to end.

It does encrypted video conference, audio conference, video mail, whiteboarding, audio streaming, desktop sharing, SMS, voice mail, IM chat, video profile, dial in, dial out, free pc-to-pc (audio & video) calling anywhere in the world, cheap pc to phone, phone to phone, secure file transfer, application sharing, even mobile (pocket pc and smart phones) all this is done end to end securely without using any kind of servers in the middle.

Check ‘em out at http://www.damaka.com/consumers/

August 17th, 2007
9:48 AM PT
Andrew said:

@ Julian - great explanation - so good I posted it on my blog :)..

August 17th, 2007
10:20 AM PT
tim said:

It would be funny if the Skype network cannot rebuild itself from scratch at its current scale. Perhaps the stability of a week ago was only achieved by its gradual growth trajectory over preceding years.

August 17th, 2007
10:31 AM PT
David Mackey said:

Hmmm…I’d have to say I’ll be considering looking for a backup provider, just in case Skype fails again.

August 17th, 2007
10:50 AM PT
Matt_ said:

Julian Cain also worked for Kazaa and Sharman Networks for a short while as their Mac developer he now works for Pando so he definitely knows how the system works .

If I remember this was also a Issue when Sharman locked unofficial clients out a few years back .

August 17th, 2007
6:16 PM PT
askbusinesscoach said:

Just when we all are looking to dump land lines. You
know those old reliable copper/fiber things that worked on 9/11 when nothing else did. I guess the question here is how do we really get to redundancy & security perfection?

August 17th, 2007
8:11 PM PT
Julian Cain said:

The next question is “What about Joost?”. Derived technology with derived flaws. Lets see how long Joost goes without a worldwide outage. I am currently reviewing the Joost network architecture and will soon release a more in depth article of the relationships between Kazaa(fastrack), Skype and Joost.

August 17th, 2007
9:02 PM PT

[...] to GigaOm’s post on Skype Groans and SIPhone Gains: “The company saw a 400% increase in traffic this morning, with 4 times increase in sales, [...]

August 18th, 2007
12:38 AM PT
Julian Cain said:

All,

While I am monitoring the Skype network “heal” itself the results are not near perfect. The “Distributed Data Store(see DHT)” is way behind. I am still receiving messages that are 24 hours late and my contact list(s) are not in “sync”. This so called “resilient network” is starting to make shape. If you are not “in the know” then please see my other writing about this outage. Here is what really happened:

•   Skype employees introduced code into the "login/connectivty" server farm that was not compatible with current Skype clients "see Morpheus getting booted from (fasttrack).
•   This is a single point of failure even in the masses of 9,000,00 concurrent users.
•   This is a single point of failure for ~220,000,000 users.

Question:
• Why did resolution take so long and is still ongoing?

Answer:
• Skype uses a Peer to Peer topology. This consists of Supernodes which maintain a “DHT” type layer between other Supernodes. This data is routed down to their Children (300 - 500 at any one time).
• The “DHT Type” layer is responsible for presence, avatars(icons) and above all “Call Routing”.
• Child nodes do not know of this such layer thus they depend on Supernodes.
• If a Supernode goes offline then all Children are cut off until they find another Supernode.
Avoidance:
• If child nodes knew about the upper layer at any point then this would be more resilient to outage because they are not dependent on Supernodes.
• Local Discovery: If Skype had a layer that made local availabilty to all users then in an office environment Skypers would be able to locate each other without the need of a Supernode. (”this excludes Bonjour as it does not relay local node cache”).
• Login failures.. This is pure redundancy(”central servers”) however Skype had this in place, but it failed because they input flawed code that was not assuring complete SSL based authentication.
• Skype Employees need to come clean and stop blaming this on “very old” bugs, test before you release, surely there is a test bed???
End:
• Skype is currently re-creating it’s decentralized network from scratch, from the end user to the Supernode users to the entire “distributed index”. This is done over UDP and takes some time so do not trust the following until the network is again stable:
• Prescence
• Profiles, this includes avatars.
• Contact list, yes you may see people that you blocked, give it time to heal.
• Stability, connections may continue to drop because your Supernode went offline and your local cache is no “doing it’s job”.

Summary:
• Decentrilzed networks like Skype are subject to global outage the same as Kazaa and possibly Joost.

Take Care

~Julian Cain

jolix (at) mac (dot) com

August 18th, 2007
12:58 AM PT
Julian Cain said:
August 20th, 2007
7:40 AM PT

[...] zajímavý komentář Julian Caina It is a huge cycle, no matter how many bugs they “fix” in the “central [...]

August 20th, 2007
8:45 AM PT

[...] this relationship between 50-odd authentication servers and supernodes and also a weak link. (Full explanation is here.) Share This | Sphere | Topic: Voice [...]

August 20th, 2007
12:31 PM PT

[...] momento per così tanti utenti? Un’altra possibilità, come suggerito da un esperto nei commenti ad un post su Gigaom, è che ad abbandonare simultanamente la rete Skype, oltre ai PC di molti [...]

August 20th, 2007
3:04 PM PT

[...] One of the best technical description of the problem (which might be a speculation as well) is here. Regardless of whether this is the real cause or not, I found it interesting because it describes [...]

August 21st, 2007
10:02 AM PT

[...] goes to Om Malik and Julian Cain for providing insight into the technical issues that crushed Skype on August 16. Skype uses Distributed Hash Table (DHT) [...]

August 21st, 2007
2:54 PM PT

[...] have one and only one chance to get it right. You’re never more than one power outage, one service outage, one information breach, bad decision, misstep, misquote, or mess up away from loosing your [...]

August 23rd, 2007
12:30 PM PT
eVxz.com said:

Skype Groans & SIPphone Gains

It has been a day from hell for Skype fans and Skype, the company. The outages have impacted many. Skype’ s misfortune turned out to be a boon for SIPphone. The company saw a 400% increase in traffic this morning,…

April 8th, 2008
4:45 PM PT
poquer texas said:

spielregeln von poker http://blog.360.yahoo.com/blog-Ak6FUdMgfLPvlX3MIhrhSuXfYDRvmg–?p=117 [url=http://blog.360.yahoo.com/blog-Ak6FUdMgfLPvlX3MIhrhSuXfYDRvmg--?p=117]spielregeln von poker[/url] [url=http://blog.360.yahoo.com/blog-V9UWjMcic6f7vhBreQ.jYDU-?p=48]advance cash loan payday[/url] advance cash loan payday today http://blog.360.yahoo.com/blog-V9UWjMcic6f7vhBreQ.jYDU-?p=48 http://blog.360.yahoo.com/blog-GeOXMMYjfqNFVFnf4pRNQu1ZWwEi?p=196 poker bonus code no deposit [url=http://blog.360.yahoo.com/blog-GeOXMMYjfqNFVFnf4pRNQu1ZWwEi?p=196]poker bonus code no deposit[/url] play free online slots http://blog.360.yahoo.com/blog-63K8xecpKvZfmTqYcqxPY22iQThe0g–?p=31 [url=http://blog.360.yahoo.com/blog-63K8xecpKvZfmTqYcqxPY22iQThe0g--?p=31]play free online slots[/url] online poker betrug http://blog.360.yahoo.com/blog-Lwo_81czaaeuD2HHEeEKy0FqKg–?p=49 [url=http://blog.360.yahoo.com/blog-Lwo_81czaaeuD2HHEeEKy0FqKg--?p=49]online poker betrug[/url]

Leave a Comment

Get the comments RSS feed, instant notification of new comments

Most Comments

10 Reasons Enterprises Aren’t Ready to Trust the Cloud
Stacey Higginbotham, July 1, 44 comments
Inside Microsoft’s Internet Infrastructure & Its Plans For The Future
Om Malik, June 30, 25 comments
Bandwidth Barons Want More Money for Fewer Bytes
Allan Leinwand, July 3, 21 comments
State of U.S. Broadband: Demand Hits Speed Bumps
Om Malik, July 2, 16 comments
The Real Reason Powerset Sold (Out)
Om Malik, July 2, 15 comments

Highest Rated

Bandwidth Barons Want More Money for Fewer Bytes
Allan Leinwand, July 3, 69%
10 of the Biggest Platform Development Mistakes
Marty Abbott and Michael Fisher, June 30, 66%
Inside Microsoft’s Internet Infrastructure & Its Plans For The Future
Om Malik, June 30, 65%
No More AT&T Callvantage?
Om Malik, July 3, 73%
Meebo’s Jen: How to Find Hard-to-Find Talent
Carleen Hawn, July 5, 75%
Close
E-mail It