30 Comments

Summary:

The New York Times today finally got around to noticing that when web sites go down, people are increasingly likely to get mad and generally react the way I might if I drove to my favorite bar and found it closed for a private party. I […]

The New York Times today finally got around to noticing that when web sites go down, people are increasingly likely to get mad and generally react the way I might if I drove to my favorite bar and found it closed for a private party. I might be miffed and share a few choice words with members of my party before deciding on a new locale. However, when we write blogs or tweets (if Twitter is up), the inconvenience and our subsequent vitriol is archived forever and transmitted around the world rather than just to our friends. And because millions of other people want to go to that same bar, the chorus of curses grows quickly.

We’ve written about how hard it is to create a 99.999 percent up time championed by the telecommunications industry, but suffice to say there are a ton of moving parts involved in keeping a site visible to the end users; the list begins with the network architecture and ends with the internet connection of a consumer in Austin. Along the way there are software upgrades, server shortages, DNS issues, cut cables, corporate firewalls, carriers throttling traffic and infected machines.

The Times notes that downtime is more than just inconvenient: As more data is stored online and cloud computing becomes more prevalent for businesses, it’s less like a bar closing for a night than a bank closing for a day. But it will never be possible to keep all sites across the entire web up 99.999 percent of the time. Knowing that, architecting for failure, and more services such as downforeveryoneorjustme.com (I would really love a more memorable name for this site) and helpful 404 pages would be appreciated.

  1. it took the PSTN almost 100 years to get 99.999% reliable. The Net will take a much longer time, probably by the time when Cisco becomes as bureaucratic as the old Western Electric (AT&T Network Systems/Lucent)

    Share
  2. Five nines is definitely achievable but you need to put the resources and funds behind your infrastructure.

    Ross
    - http://www.hostdisciple.com

    Share
  3. Roland Dobbins Sunday, July 6, 2008

    Achieving five-nines in one’s own public-facing infratructure is most certainly doable, and it is in fact done all the time. The problem is that you’re dependent upon the infrastructure of others (SPs, enterprise networks, mobile networks, the users’ computers/OSes/applications, et. al.), over which you’ve no contol, to deliver packets to you; and once packets leave your infrastructure, they’re once again dependent upon the infrastructure of others, over which you’ve no control, to reach their destinations unmolested.

    So, the assertion that ‘Five Nines on the Net is a Pipe Dream’ is really missing the point; it’s more like ‘The Definition of Availability on the Internet is Elusive Due to its Very Nature’, or something along those lines.

    Now, a more cogent and useful essay would be one on why so many ‘vital’ Web 2.0 companies simply don’t build their applications and infrastructures so that they can scale, and why, even when they’re wildly successful, they don’t implment the well-known best current practices (BCPs) which would maximize availability and resiliency within their own spans of control.

    Share
  4. [...] Read the rest of this post Print all_things_di220:http://voices.allthingsd.com/20080707/five-nines-on-the-net-is-a-pipe-dream/ Sphere Comment Tagged: 99.999% uptime, GigaOm, New York Times, Stacey Higginbotham, Twitter, Voices, Web sites | permalink [...]

    Share
  5. 99.999 is a pipe dream for one simple reason: it’s unnecessary. There are very few web-based services that truly need that kind of up-time. Precious company resources are better spent elsewhere.

    Share
  6. Aman Sehgal Monday, July 7, 2008

    Hi Stacey,
    I agree that achieving five 9s for Internet is almost impossible. Servers also need some amount of time to rest :) and we can not get away with it. There is always a time for every site to be down and no one knows it when that D-Time will come. You mentioned that 404 pages should be more informative. I also feel the same. If actual reason for the site not working is displayed then it will in one sense mean that site is NOT out of order i.e. it is working in one aspect and not working in another. This information will in fact help the netizen to roughly estimate the time after which the site will be up, running and available for browsing.

    Share
  7. “But it will never be possible to keep all sites across the entire web up 99.999 percent of the time.”

    This represents a fundamental misunderstanding of how the Internet works.

    Share
  8. David,

    You are being presumptuous in assuming someone doesn’t know how the Internet works. Care to expand. Don’t leave a hanging comment without really a reason.

    Share
  9. @ pwb

    Thank you for your short and sweet comment. Is it doable – of course it is. Has it been done? of course not.It is a pipe dream because we are not even close to being done in terms of technology – both hardware and software.

    Share
  10. Om,

    What are you talking about? Many companies pull of five nines in a year. Maybe not your latest greatest web 2.0 start up but you’ll be hard pressed to find traditional companies websites down. Go start monitoring Merill Lynch or UBS and see how often their site goes down.

    Ross
    - http://www.hostdisciple.com

    Share

Comments have been disabled for this post