30 Comments

Summary:

The New York Times today finally got around to noticing that when web sites go down, people are increasingly likely to get mad and generally react the way I might if I drove to my favorite bar and found it closed for a private party. I […]

The New York Times today finally got around to noticing that when web sites go down, people are increasingly likely to get mad and generally react the way I might if I drove to my favorite bar and found it closed for a private party. I might be miffed and share a few choice words with members of my party before deciding on a new locale. However, when we write blogs or tweets (if Twitter is up), the inconvenience and our subsequent vitriol is archived forever and transmitted around the world rather than just to our friends. And because millions of other people want to go to that same bar, the chorus of curses grows quickly.

We’ve written about how hard it is to create a 99.999 percent up time championed by the telecommunications industry, but suffice to say there are a ton of moving parts involved in keeping a site visible to the end users; the list begins with the network architecture and ends with the internet connection of a consumer in Austin. Along the way there are software upgrades, server shortages, DNS issues, cut cables, corporate firewalls, carriers throttling traffic and infected machines.

The Times notes that downtime is more than just inconvenient: As more data is stored online and cloud computing becomes more prevalent for businesses, it’s less like a bar closing for a night than a bank closing for a day. But it will never be possible to keep all sites across the entire web up 99.999 percent of the time. Knowing that, architecting for failure, and more services such as downforeveryoneorjustme.com (I would really love a more memorable name for this site) and helpful 404 pages would be appreciated.

  1. it took the PSTN almost 100 years to get 99.999% reliable. The Net will take a much longer time, probably by the time when Cisco becomes as bureaucratic as the old Western Electric (AT&T Network Systems/Lucent)

    Share
  2. Five nines is definitely achievable but you need to put the resources and funds behind your infrastructure.

    Ross
    - http://www.hostdisciple.com

    Share
  3. Roland Dobbins Sunday, July 6 2008

    Achieving five-nines in one’s own public-facing infratructure is most certainly doable, and it is in fact done all the time. The problem is that you’re dependent upon the infrastructure of others (SPs, enterprise networks, mobile networks, the users’ computers/OSes/applications, et. al.), over which you’ve no contol, to deliver packets to you; and once packets leave your infrastructure, they’re once again dependent upon the infrastructure of others, over which you’ve no control, to reach their destinations unmolested.

    So, the assertion that ‘Five Nines on the Net is a Pipe Dream’ is really missing the point; it’s more like ‘The Definition of Availability on the Internet is Elusive Due to its Very Nature’, or something along those lines.

    Now, a more cogent and useful essay would be one on why so many ‘vital’ Web 2.0 companies simply don’t build their applications and infrastructures so that they can scale, and why, even when they’re wildly successful, they don’t implment the well-known best current practices (BCPs) which would maximize availability and resiliency within their own spans of control.

    Share
  4. [...] Read the rest of this post Print all_things_di220:http://voices.allthingsd.com/20080707/five-nines-on-the-net-is-a-pipe-dream/ Sphere Comment Tagged: 99.999% uptime, GigaOm, New York Times, Stacey Higginbotham, Twitter, Voices, Web sites | permalink [...]

    Share
  5. 99.999 is a pipe dream for one simple reason: it’s unnecessary. There are very few web-based services that truly need that kind of up-time. Precious company resources are better spent elsewhere.

    Share
  6. Aman Sehgal Monday, July 7 2008

    Hi Stacey,
    I agree that achieving five 9s for Internet is almost impossible. Servers also need some amount of time to rest :) and we can not get away with it. There is always a time for every site to be down and no one knows it when that D-Time will come. You mentioned that 404 pages should be more informative. I also feel the same. If actual reason for the site not working is displayed then it will in one sense mean that site is NOT out of order i.e. it is working in one aspect and not working in another. This information will in fact help the netizen to roughly estimate the time after which the site will be up, running and available for browsing.

    Share
  7. “But it will never be possible to keep all sites across the entire web up 99.999 percent of the time.”

    This represents a fundamental misunderstanding of how the Internet works.

    Share
  8. David,

    You are being presumptuous in assuming someone doesn’t know how the Internet works. Care to expand. Don’t leave a hanging comment without really a reason.

    Share
  9. @ pwb

    Thank you for your short and sweet comment. Is it doable – of course it is. Has it been done? of course not.It is a pipe dream because we are not even close to being done in terms of technology – both hardware and software.

    Share
  10. Om,

    What are you talking about? Many companies pull of five nines in a year. Maybe not your latest greatest web 2.0 start up but you’ll be hard pressed to find traditional companies websites down. Go start monitoring Merill Lynch or UBS and see how often their site goes down.

    Ross
    - http://www.hostdisciple.com

    Share
  11. Ross,

    Two things:

    1. There is a huge difference in the size/audience/usage/money spent by Wall Street and other enterprises versus regular consumer facing web services. Which is what Stacey was trying to point out.

    2. Secondly, when the ML and UBS go down, no one notices and they actively don’t share that information.

    Last point, please stop including your URL in the message for it comes across as too self promotional. Not that there is anything wrong with that.

    Share
  12. Stacey Higginbotham Monday, July 7 2008

    Guys, there may be sites that experience five nines (do we count planned outages?), but the point of the post is that that is hard to do because there are so many moving parts, and when it does go down people notice and news spreads quickly. It’s impossible to believe that right now a site will be both up and available to ALL users 99.99 percent of the time.

    And on the web it only takes a few users having problems to damage the brand.

    Share
  13. Even Google rarely maintains five 9′s. http://royal.pingdom.com/?p=192
    Granted the article is almost a year old but only one country’s Google site was at 99.999% And Google has an enormous infrastructure investment and can afford the type of redundancy that is required to maintain that kind of availability. Can it be done? Absolutely. Will most smaller organizations have the resources to do it? Probably not, at least until the cost of a massively redundant infrastructure comes down.

    Share
  14. Om,

    I guess we have to agree to disagree. People notice when they can’t get to their financials. Lets pick a better site like Etrade or Ameritrade. These companies need uptime and a single bit of downtime can cost them millions.

    An even better example is ebay and you can check out their track record here:
    http://uptime.pingdom.com/site/month_summary/site_name/www.ebay.com (so far in 2008)

    or

    http://uptime.pingdom.com/site/month_summary/site_name/www.yahoo.com (look at 2007)

    Just because Twitter goes down often doesn’t mean five nines isn’t achievable.

    Share
    1. I love that Ross includes a site that only pings every 5 minutes to try to measure the 9′s of a site. 5 9′s is 5.24 minutes a year, so pingdom.com would be unlikely to detect if a 5 9′s site was ever down.

      To even attempt to determine if a site might be 5 9′s you’re going to have to ping a lot more frequently than that. And that’s just getting a ping response. I think most sites would consider availability to include actually being able to deliver a web page, not just a networking ping back.

      Share
  15. Five nines is very doable — getting a VC or dot-commer that runs their business off of crappy boxes stuffed in their dorm rooms to pay for the required technology is a pipe dream. Hire infrastructure vendors that play in the communications world and they’ll engineer a 5-9′s system no problem. Continue to get your technology from Open Source and Fry’s and you’ll be lucy to get 3.

    Share
  16. austinandrew Monday, July 7 2008

    @ Ross – did you read the article? You write “Go start monitoring Merill Lynch or UBS and see how often their site goes down.” The point is that even if these guys were up 100% of the time, the infrastructure from their server to your home won’t be. That’s why VOIP sucks. Even if the VOIP provider is good, the service is only as good as your cable modem.

    Share
  17. Edmund Elkin Tuesday, July 8 2008

    Is 5 9′s necessary? It depends upon the specific service, or the type of service, from that website. For example, if it is a road-traffic condition, then no worries if Site A is down, I just go to Site B, which is especially easy if I use a search engine to initially identify those sites (that keeps the bookmarks under control). On the other hand, if it is where I buy tickets for the big baseball game, or my 401(k) site, then I really want access now, and unavailability is not acceptable.
    So, it would be useful to have a variable mechanism for ensuring QoS. That would save on the total investment / price, and ensure quality is there when I need it.
    Also, remember… 5 9′s refers to “availability” not reliability. Or as the original article correctly referred to it: “up time.”

    Share
  18. Interesting post. There are bigger brains than mine in this comment thread, but some thoughts I had while reading:

    As noted, big difference between five nines for a site, versus five nines for end user

    I believe five nines equates to five minutes of down time a year — no, I can’t remember the formula for that number. It’s quite possible there are some businesses that will never need that level of performance

    DNS is often not given enough attention from reliability standpoint — a point no doubt made in David’s commentary with Om. Hopefully DNS’s visibility will rise along with cloud computing. It’s also critical for enterprise VOIP deployments

    Speaking of DNS security, it will be very interesting to see where ICANN sets the technology bar in their RFPs for the brand new TLDs — scuttlebutt suggests the standards will be on the low side

    More informative 404 pages opens the door to redirects for advertising purposes. I’m not saying that’s necessarily bad, but adds complexity and can cause security problems (Earthlink), based on how implemented

    Share
  19. [...] GigaOm takes the perspective that Five Nines on the Net is a Pipe Dream. [...]

    Share
  20. Stacey, Om

    When I saw your 5 9′s headline, I first tried to think of any dsl or fiber networks that were built for that. Even after I saw you wewre discussing the other end, the web sites, I pondered that question. Our Verizon DSL line has come close to that the last year, while my Time Warner cable drops often several times a day for a few seconds and they refuse to fix it. FIOS and the other fiber builds are designed for very good reliability, including no active elements in the field.

    I’m very aware that broadband networks absolutely have not been engineered for anything close to the 5 9′s of the traditional telcos, with single points of failure common in DSL networks. The new fiber nets may be superior, but that’s not yet proven.

    But I did realize at least one Internet service that is extraordinarily reliable. The NY ISP Panix has not allowed my email to fail in over five years, and the founder, Alexis Rosen, told me a while back it hadn’t gone down more than a few hours since Panix was founded early in the 1990s as one of the world’s first ISPs.

    Reliability is not impossible in the Internet world.

    Dave Burstein

    Share
  21. [...] almost a decade.  Gigoam commented recently about the fact that the current movement towards more cloud-based computing is going to require [...]

    Share
  22. Quote: “Knowing that, architecting for failure, and more services such as downforeveryoneorjustme.com (I would really love a more memorable name for this site) and helpful 404 pages would be appreciated.”

    Your wish is granted: http://isitfucked.com

    Enjoy!

    Share
  23. [...] of YSlow software, which measures web site performance. It’s a nice reminder that there are many links in the chain that create a user’s web experience, and it focuses on what developers and site designers can do to make pages load [...]

    Share
  24. [...] Five nines is too expensive for most free, consumer-oriented web services to maintain, and realizing that, we seem to be building out our store of redundant communications. So now, when life offers us power outages, snowstorms and even Gmail failures, we’re able to pick right up and keep blogging, tweeting, texting and posting our thoughts into the ether. GA_googleFillSlot(“gigaom_ros_post_footer”); [...]

    Share
  25. [...] glitches and failures that may seem manageable in a corporate setting, have awesome power when they spread across the enormous number of users on the web. So how is cloud computing like space travel? Small problems can equal a monumental [...]

    Share
  26. [...] the meantime, feel free to revisit our post about the difficulty of ensuring total reliability on the Internet, and ponder your backup communication strategies. The timing is someone ironic, as it comes one day [...]

    Share
  27. [...] — this is a good reminder that these type of hiccups (and six hours is a long hiccup) are bound to happen when relying entirely on the Internet for video [...]

    Share
  28. [...] but I think as far as getting business information and services via the web thanks to the movement to the cloud, some sort of user-friendly tool would make a complex process a bit easier to understand. I hope [...]

    Share
  29. [...] almost a decade.  Gigoam commented recently about the fact that the current movement towards more cloud-based computing is going to require [...]

    Share

Comments have been disabled for this post