Blog Post

Does Skype Outage Expose P2P's Limitations?

[qi:90] Update: Peer-to-peer by its very nature is supposed to work without a problem, with packets finding their way to good peers, and then moving on to their final destination. The Skype outage – you couldn’t login to your Skype account – that started sometime last night makes you wonder about how resilient are P2P services.

It is still not clear why it happened and what exactly happened. Skype has released no details just yet. Tom Keating says that the problem started after he and his colleagues downloaded some of the new patches Microsoft released to upgrade its operating systems, Windows Vista and Windows XP. [digg=]

Since Skype is a P2P network that relies on other peers for the network to function properly, it’s possible a Microsoft update is causing a conflict.

On the Mac, however, I had no trouble logging in this morning, but the client kept crashing. If a software upgrade from Microsoft (or for that matter any other OS vendor) can render Skype, one of the largest P2P services useless, then P2P economy is standing on shaky ground.

Update: On second thoughts, I want to be clear that if you are going to build a mass market consumer service on P2P and use authentication servers or add layers on top of the basic architecture, then you are on shaky ground and need to build in some sort of redundancy. (Thanks Ethan, for showing me the light!)

Folks at Joost, Babelgum and other P2P companies should be concerned about their business prospects going forward. Venture capitalists who have been funding P2P-based services should take this as an early warning on the fragility of the whole P2P ecosystem, where a small glitch can cause widespread problems.

On the flip side, if Skype’s authentication servers asphyxiated, then let this be a reminder that Skype is not quite your phone company replacement. This must have impacted thousands (if not millions) of companies and web workers who lost money and productivity. (Update #2: Skype blogs says that it is their software issue.)

“The folks who get hurt by this are the Skype based service providers who need the Skype connectivity layer to keep things working,” writes Andy Abramson.

44 Responses to “Does Skype Outage Expose P2P's Limitations?”

  1. I tend to agree with Ian that Skype may not actually be p2p concept. That makes it vulnerable to hold ups on the network. Will Skype come out with an explanation to the contrary?

  2. Very interesting discussion. My view point is from managing a server centric network for 15 years with millions of messages each day. Server based networks reboot easily and grow badly while P2P networks grow well and reboot poorly. You are correct, Skypes has never tested network rebooting and cannot with 200 million nodes but they can do a lot better job than they apparently have done simulating it.

    A lot is said about scaling large networks but little about rebooting them. The complexity of rebooting a network seems to grow exponentially with the number of nodes. What will happen when power goes off one or two of Google’s large data centers? Though they will have problems, I doubt if they are anything like the dimension of Skypes’ problems.

    Also remember, the mean time to repair goes to infinity as the mean time between failures goes to infinity. If it doesn’t fail you can’t fix it when it does! The fact that Skypes has never encountered this problem before makes repair really, really hard.

  3. I posted some analysis on my blog. I think the problem is not related to any authentication service or software issue. It seems to me that Skype never thought how to handle problems in a distributed system scenario where one node depends on another to provide service. My guess is that this provides a taste of things to not only Skype but all of us. Lets see what Skype announces (if there are any announcements…)

  4. If only the SIP standard provided 50% of skype’s audio quality…

    An open standards play always has a slower start than a closed proprietary standard. Remember Amstrad and Sinclair computers?

    With a closed standard big investments can be quickly justified and projects can be aggressive. It takes time for open standards to acquire critical mass and those fabled network effects.

    Anyway. My rambling thoughts as I await my Skype phone TO CONNECT ALREADY. I cannot believe it now 36 hours of outage in London.

  5. To say that this exposes the weakness in P2P is a little extreme, all that seems to be failing is Skype’s authentication system, something that isn’t actually a requirement for P2P, but is for a paid service using P2P data communication.

  6. Skype outages exposes Skypes limitations.

    Everyone should move over to the open SIP technology which is much more reliable, which is not controlled by any one company in terms of calling from VOIP to the established voice networks and back. SIP is thousands of companies, it’s competition. With SIP you can encrypt your conversations with whatever encryption you want and thus be sure there isn’t a backdoor for any government to listen in on your VOIP calls.

    Everyone should change to SIP VOIP calls, it’s much cheaper hardware also.

  7. SIP calls are peer-to-peer as far as the RTP voice streams are concerned – they go from one soft-phone to another, directly – at least in cases where NAT traversal is a viable option and RTP proxying is not required.

    Recent information shows that the Skype problem occurred in some of the central servers dedicated to the many tasks where P2P options are still research areas: authentication, billing, etc. If the issue is truly one related to the P2P nature of Skype, then I would expect them to release a new version of their soft-phone – which they haven’t done so far.

    What the Skype meltdown highlights is a set of well-known issues:

    a) Poor software engineering practices that made Skype release some unsufficiently tested software that caused a major outage under load. We have seen it already and will see it in the future again. P2P as well as centralized architectures are both vulnerable to software errors.

    b) A lack of emphasis on investing in making what has become a mission critical infrastructure, well… truly mission critical! This is something that telcos are very good at and Skype obviously still has to learn about.

    What P2P does marvelously well is working around network issues such a failed nodes or temporary bottlenecks. DHTs manage that via data replication and clever routing algorithms (KBRs). Similar redundancy schemes exist in the server world too, except that when your data center Internet pipes have been cut by a tractor, you are truly in a desperate situation. That’s why corporations have disaster recovery schemes involving distributed sites… reminiscent of P2P architectures!

    P2P is not a panacea and I can’t believe any intelligent VC would finance a P2P venture because the software is more resilient. My company, Peerant, got money because our P2P platform was more flexible and allowed new telephony applications to be developed.

    A nasty software bug not caught by QA is as likely to take down a 1,000,000 nodes P2P network as 20 nodes centralized cluster of servers. Let’s make software better, understanding the complexity of what we are dealing with, and we’ll have less of these issues.

    Serge Kruppa
    (Disclaimer: my company has developed a P2P application platform working as an overlay on top of Skype, and thank God, other VoIP networks)

  8. This actually shows the limitations of p2p networks that use centralized servers for authentication and centralized supernodes for example .

    Skypes main servers are based in Luxembourg as are Joosts .

    Joost has had several issues because of their backend getting swamped not the p2p component .

    Other hybrid p2p services (p2pCDNs) have also had issue with their control servers that are used to control the network same happens when a Bittorrent tracker gets swamped and why Distributed hash tables where invented .

  9. It really makes me think of a distributed hybrid solution, where each smaller local network in real-time decides whether to use VoIP or PSTN based on priority/cost/time-of-day/network status (outage aware) independent of central server.

    • Mahesh Lalwani
  10. Sandelwoods

    I know of the perfect replacement:


    • P2P SIP Softphone
    • Free Peer to Peer calls + conference
    • Free P2P Video and 4-part conferencing
    • File transfer, File sharing
    • Desktop and application sharing
    • Connect to MSN/Yahoo/AOL/Gtalk from damaka

    Check out this wonderful post by Luis Suarez:

    Download and check it out:


  11. I seriously doubt that this has anything to do with Windows. Skype implemented a software update that was completed at 7 am GMT on the 15th, which like RIM went bad. It’s coincidental that a MSFT update impacted Tom Keating.

    What this outage proves is that mobile voice is the natural fall back choice in times of disaster and stress. Didn’t most Skype users just use their mobile phones instead?

    A VC would be foolish to invest in a P2P technology because it was more resilient than another architecture. They should invest in P2P approaches because it can enable new kinds of services without much of the burden (read cost) of centralized approaches. New functionality, less cost. That’s the ticket!

  12. P2P does not mean “crash-free” or “zero-down-time”. It just means peers can connect directly to each other for some aspects of the service.

    As Skype uses P2P for moving voice, it just means that their downtime is not due to their servers being overloaded by the voice data. Obviously there are many other reasons for a web service to crash!

  13. I’d lean more on the side of things this being a problem with Skype’s infrastructure than it is a problem with Microsoft Update (although the latter can’t be counted out either).

    I did my massive Tuesday update Wednesday also, but didn’t notice any issues with Skype until Thursday morning, so the very least it’s not a local Microsoft Windows update issue. Now, if that update knocked out some of my Skype’s ability to act as a P2P node and has been doing that since last two days to eventually kill most of the P2P network is a possibility.

    My first guess, however, was that the symptoms reminded me of a denial of service attack on Skype’s identity servers. P2P apps are more resilient to such attacks but when you put an identity service on top of the app (such as Skype’s login) you of course introduce a point of vulnerability (if not strictly a single point of failure).

    That however, does not explain the login and then dropped out problem that is occuring. Unless Skype keeps pinging its authentication service on a regular basis and then drops you out when reacquiring a lease fails.

  14. Would be good to have system to let the user know what is going on – this type of problem is statistically inevitable. Why not have a Twitter plan or something like it to be in place to tell users what’s up?

  15. Om, I completely see your point. I think it came across much more clearly in your comment than in your post.

    For a few months I struggled with frequent long term outages with a large POTS provider, which in turn, pushed me towards VoIP for the first time. However, I’ll be the first to volunteer that that is probably not the most common scenario.

    I’m very curious to see how Skype responds to this both in terms of publicity and actual action.

  16. Although Skype is the technology provider, as rightly noted it seems to be the weakest link. If they breakdown, the peers go down as well. One option is take authentication through peers. Still, there is big dependency on the OS where application is running. We have all heard allegations of OEM’s intentional tweaks to disrupt competitors (e.g. Google vs. Microsoft on Vista search). Browser based P2P services are less susceptible, however, services with large footprint on the OS are definitely vulnerable. It’s best to work hand in hand with the OEMs; if that’s not an option, then keep a close eye on what’s cooking on the other camp and play catch-up.

  17. Skype is not a true P2P system, since they rely on central authentication servers. Something like this outage was always a possibility.

    In this case though, it sounds like a bug in the software has caused this issue.

    This might be a wake-up call for a lot of Skype users, and they should consider using open standards based internet communications as a backup or add-on to Skype.

    By it’s very nature, SIP protocol based services are distributed and therefore much more resilient.

    Kind of like running your own email server: If you have an outage, it doesn’t affect the rest of the internet.

  18. Jay,

    I think it shows that there are weak links in the whole P2P systems. Lets say if the authentication server failed, that is a problem. If it is Microsoft then that is a problem too.

    What I am saying here is that the P2P resilience is a bit exaggerated and as a result we need to be cautious – users, creators and financiers – and be prepared for scenarios like this.

    On the POTS side of things, tell me when was the phone line down (unless someone cut the cables?) for 24 hours or so. VoIP systems, i agree can be problematic at times.

  19. Eric Willis

    Did the facebook outage point the fragile nature of the entire server-centric computing industry? No. So I think it’s a bit unfair to nail the entire P2P industry to the wall based on a very rare skype outage… that could be related to isolated events unrelated to the actual p2p software running the service. I would be willing to be there is some issue on the central server and not the fasttrack technology. The skype system is not completely decentralized.

  20. This does seem to lie on the FUD side of things. With no details on the actual cause of the outage you proclaim that all P2P based technologies are on shaky ground. That seems like an awfully big generalization to make, all things considered. Skype in its free form has performed significantly better than the multiple VoIP and POTS providers I’ve been with in terms of call quality and uptime (See I can argue using anecdotal evidence too). Like Dan said above me, outages are not uncommon in communications. I don’t think it’s that big of a deal.

  21. To be fair, your traditional phone company suffers outages too (albeit localized ones). In fact, these can happen very frequently and can take a long time to come back online.

    My father’s office building, in one of the most prestigious neighborhood’s in the world (Park Ave in Manhattan), has frequent phone outages, due to the phone company’s faulty infrastructure. Since the fiber-optics to the building are far more sound, his firm is switching to VoIP. His neighbors are Bear Stearns and Morgan-Stanley, who I believe are considering similar moves.