79 Comments

Summary:

Many of the Web 2.0 companies that I meet in my job as a venture capitalist lack even the most basic understanding of Internet operations. They’d better figure it out — and fast — because not doing so will only cost them money down the road. Continue Reading

I have a major problem with many of the Web 2.0 companies that I meet in my job as a venture capitalist: They lack even the most basic understanding of Internet operations.

I realize that the Web 2.0 community generally views Internet operations and network engineering as router-hugging relics of the past century desperately clutching to their cryptic, SSH-enabled command line interfaces, but I have recently been reminded by some of my friends working on Web 2.0 applications that Internet operations can actually have a major impact on this century’s application performance and operating costs.

So all you agile programmers working on Ruby-on-Rails, Python and AJAX, pay attention: If you want more people to think your application loads faster than Google and do not want to pay more to those ancient phone companies providing your connectivity, learn about your host. It’s called the Internet.

As my first case in point, I was recently contacted by a friend working at a Web 2.0 company that just launched their application. They were getting pretty good traction and adoption, adding around a thousand unique users per day, but just as the buzz was starting to build, the distributed denial-of-service (DDOS) attack arrived. The DDOS attack was deliberate, malicious and completely crushed their site. This was not an extortion type of DDOS attack (where the attacker contacts the site and extorts money in exchange for not taking their site offline), it was an extraordinarily harmful site performance attack that rendered that site virtually unusable, taking a non-Google-esque time of about three minutes to load.

No one at my friend’s company had a clue as to how to stop the DDOS attack. The basics of securing the Web 2.0 application against security issues on the host system — the Internet — were completely lacking. With the help of some other friends, ones that combat DDOS attacks on a daily basis, we were able to configure the routers and firewalls at the company to turn off inbound ICMP echo requests, block inbound high port number UDP packets and enable SYN cookies. We also contacted the upstream ISP and enabled some IP address blocking. These steps, along with a few more tricks, were enough to thwart the DDOS attack until my friend’s company could find an Internet operations consultant to come on board and configure their systems with the latest DDOS prevention software and configurations.

Unfortunately, the poor site performance was not missed by the blogosphere. The application has suffered from a stream of bad publicity; it’s also missed a major window of opportunity for user adoption, which has sloped significantly downward since the DDOS attack and shows no sign of recovering. So if the previous paragraph read like alphabet soup to everyone at your Web 2.0 company, it’s high time you start looking for a router-hugger, or soon your site will be loading as slowly as AOL over a 19.2 Kbps modem.

Another friend of mine was helping to run Internet operations for a Web 2.0 company with a sizable amount of traffic — about half a gigabit per second. They were running this traffic over a single gigabit Ethernet link to an upstream ISP run by an ancient phone company providing them connectivity to their host, the Internet. As their traffic steadily increased, they consulted the ISP and ordered a second gigabit Ethernet connection.

Traffic increased steadily and almost linearly until it reached about 800 megabits per second, at which point it peaked, refusing to rise above a gigabit. The Web 2.0 company began to worry that either their application was limited in its performance or that users were suddenly using it differently.

On a hunch, my friend called me up and asked that I take a look at their Internet operations and configurations. Without going into a wealth of detail, the problem was that while my friend’s company had two routers, each with a gigabit Ethernet link to their ISP, the BGP routing configuration was done horribly wrong and resulted in all traffic using a single gigabit Ethernet link, never both at the same time. (For those interested, both gigabit Ethernet links went to the same upstream eBGP router at the ISP, which meant that the exact same AS-Path lengths, MEDs, and local preferences were being sent to my friend’s routers for all prefixes. So BGP picked the eBGP peer with the lowest IP address for all prefixes and traffic). Fortunately, a temporary solution was relatively easy (I configured each router to only take half of the prefixes from each upstream eBGP peer) and worked with the ISP to give my friend some real routing diversity.

The traffic to my friend’s Web 2.0 company is back on a linear climb – in fact it jumped to over a gigabit as soon as I was done configuring the routers. While the company has their redundancy and connectivity worked out, they did pay their ancient phone company ISP for over four months for a second link that was essentially worthless. I will leave that negotiation up to them, but I’m fairly sure the response from the ISP will be something like, “We installed the link and provided connectivity, sorry if you could not use it properly. Please go pound sand and thank you for your business.” Only by using some cryptic command line interface was I able to enable their Internet operations to scale with their application and get the company some value for the money they were spending on connectivity.

Web 2.0 companies need to get a better understanding of the host entity that runs their business, the Internet. If not, they need to need to find someone that does, preferably someone they bring in at inception. Failing to do so will inevitably cost these companies users, performance and money.

  1. Why bother? Isn’t Amazon EC2/S3 around?

    Emil

    Share
  2. “it’s high time you start looking for a router-hugger, or soon your site will be loading as slowly as AOL over a 19.2 Kbps modem.”

    great line.

    Share
  3. @Emil – thank you for articulating my point in two words :) If you think relying on Amazon’s outsourced infrastructure will enable you to build a highly scalable Web2.0 application without any knowledge of Internet operations then I predict your business will encounter Internet operations issues and cost you more money than you realize as you scale. Don’t get me wrong – Amazon runs a good operation – but a lack of understanding of the host infrastructure that your business relies on to make money is going to be an issue. What happens when your employees sitting in an office in Indiana have connectivity issues connecting to Amazon’s service when you launch your service this month?

    @Jon – thanks.

    Share
  4. allan, unfortunately not every startup has the co-author of a cisco router book on board with real tech chops still working (even the most tech savvy vc’s i’ve seen haven’t touched code in at least 10 years, sorry)…but per emil, this is where the opportunity gets real for amazon, cloudfs and others – of course my real question for you is what are you gonna do with that first company that floundered and lost traction after the access debacle?

    there’s no cure for this stuff. startups in this space would be well served to spend a little more time following the activities of the IETF and examine how they’re thinking about these problems in a more vendor-neutral way…

    Share
  5. @dave – I’m not an investor in either company that I mentioned. So, for the company that floundered, I did my best to get them a consultant in a timely manner and it’s now up to them. Also, please don’t get me wrong – I think that moving services to the cloud can be the right way to go for some Web2.0 applications, but when you don’t understand the basics of the technology that allows your business to operate, well….

    Has Web2.0 really killed the network engineer as I wrote about last year? http://gigaom.com/2007/04/10/web-20-death-of-the-network-engineer/

    Share
  6. On a far smaller scale, I just discovered a small startup (non-IT – medical devices, to be specific) which I support has been paying $140 per month for 2Mbps DSL. Expensive? Well, it would be: most of that money was actually for the bundled webhosting, which they had never used or indeed known they had!

    Allan: Absolutely. Emil, maybe your startup scales well, so that as your traffic builds up you can ratchet up through 10, 100 Amazon servers – but sooner or later, it’ll come back to bite you, either when you hit a bottleneck you hadn’t spotted or when someone else more efficient comes along and eats your lunch with a quarter of your costs for the same service!

    I’ve always felt that trying to build any kind of Internet service without understanding the structure you’re building on is a bad idea. There are things you should bear in mind which you simply won’t understand otherwise – as in this case: why two separate peer links to a single ISP, rather than dual-homing (connecting to two ISPs) or a simple bonded link between the two routers? Maybe in this particular case there were good reasons for this particular setup, but the company should have had someone thinking this sort of thing through before spending lots of money committing to one option!

    I’ve seen painfully slow solutions built on what should be a lightning-fast CDN, thanks to poor implementation – and far faster sites on one small server on the far side of the planet with a well-tuned setup. You can all guess which one cost more – and yet it’s the other one which provided the better experience for end-users!

    Share
  7. There’s a meta issue at work here and it has to do with the kinds of activities that are typically recognized and rewarded in technology companies, particularly startups. Product releases and (in some cases) sales are everything. Tactical execution, on the other hand, isn’t recognized much at all — not by investors, executives, or users.

    It is quite unsurprising that this would happen in a world in which most VCs will only fund companies founded by kids who are only a few years out of college with little or no operational experience in running a web site.

    Share
  8. @James – thanks for the comments – thankfully the extra costs were limited to $140/month. When you’re talking about a GigE link to a Tier-1 ISP, you’re into a few orders of magnitude more of expense.

    @Jeffrey = I don’t think that we’re not one of those VCs :) http://www.panoramacapital.com/portfolio.shtml

    Share
  9. @Jeffrey – Grrr….that should have read: I don’t think we’re one of those VCs :)

    Share
  10. Allan,

    I’m glad you’ve realized your post about the “death of the network engineer” was highly exaggerated (and totally ridiculous).

    This post, however, is great. :-)

    Share

Comments have been disabled for this post