Web 2.0, Please Meet Your Host, the Internet

79 Comments

I have a major problem with many of the Web 2.0 companies that I meet in my job as a venture capitalist: They lack even the most basic understanding of Internet operations.

I realize that the Web 2.0 community generally views Internet operations and network engineering as router-hugging relics of the past century desperately clutching to their cryptic, SSH-enabled command line interfaces, but I have recently been reminded by some of my friends working on Web 2.0 applications that Internet operations can actually have a major impact on this century’s application performance and operating costs.

So all you agile programmers working on Ruby-on-Rails, Python and AJAX, pay attention: If you want more people to think your application loads faster than Google and do not want to pay more to those ancient phone companies providing your connectivity, learn about your host. It’s called the Internet.

As my first case in point, I was recently contacted by a friend working at a Web 2.0 company that just launched their application. They were getting pretty good traction and adoption, adding around a thousand unique users per day, but just as the buzz was starting to build, the distributed denial-of-service (DDOS) attack arrived. The DDOS attack was deliberate, malicious and completely crushed their site. This was not an extortion type of DDOS attack (where the attacker contacts the site and extorts money in exchange for not taking their site offline), it was an extraordinarily harmful site performance attack that rendered that site virtually unusable, taking a non-Google-esque time of about three minutes to load.

No one at my friend’s company had a clue as to how to stop the DDOS attack. The basics of securing the Web 2.0 application against security issues on the host system — the Internet — were completely lacking. With the help of some other friends, ones that combat DDOS attacks on a daily basis, we were able to configure the routers and firewalls at the company to turn off inbound ICMP echo requests, block inbound high port number UDP packets and enable SYN cookies. We also contacted the upstream ISP and enabled some IP address blocking. These steps, along with a few more tricks, were enough to thwart the DDOS attack until my friend’s company could find an Internet operations consultant to come on board and configure their systems with the latest DDOS prevention software and configurations.

Unfortunately, the poor site performance was not missed by the blogosphere. The application has suffered from a stream of bad publicity; it’s also missed a major window of opportunity for user adoption, which has sloped significantly downward since the DDOS attack and shows no sign of recovering. So if the previous paragraph read like alphabet soup to everyone at your Web 2.0 company, it’s high time you start looking for a router-hugger, or soon your site will be loading as slowly as AOL over a 19.2 Kbps modem.

Another friend of mine was helping to run Internet operations for a Web 2.0 company with a sizable amount of traffic — about half a gigabit per second. They were running this traffic over a single gigabit Ethernet link to an upstream ISP run by an ancient phone company providing them connectivity to their host, the Internet. As their traffic steadily increased, they consulted the ISP and ordered a second gigabit Ethernet connection.

Traffic increased steadily and almost linearly until it reached about 800 megabits per second, at which point it peaked, refusing to rise above a gigabit. The Web 2.0 company began to worry that either their application was limited in its performance or that users were suddenly using it differently.

On a hunch, my friend called me up and asked that I take a look at their Internet operations and configurations. Without going into a wealth of detail, the problem was that while my friend’s company had two routers, each with a gigabit Ethernet link to their ISP, the BGP routing configuration was done horribly wrong and resulted in all traffic using a single gigabit Ethernet link, never both at the same time. (For those interested, both gigabit Ethernet links went to the same upstream eBGP router at the ISP, which meant that the exact same AS-Path lengths, MEDs, and local preferences were being sent to my friend’s routers for all prefixes. So BGP picked the eBGP peer with the lowest IP address for all prefixes and traffic). Fortunately, a temporary solution was relatively easy (I configured each router to only take half of the prefixes from each upstream eBGP peer) and worked with the ISP to give my friend some real routing diversity.

The traffic to my friend’s Web 2.0 company is back on a linear climb – in fact it jumped to over a gigabit as soon as I was done configuring the routers. While the company has their redundancy and connectivity worked out, they did pay their ancient phone company ISP for over four months for a second link that was essentially worthless. I will leave that negotiation up to them, but I’m fairly sure the response from the ISP will be something like, “We installed the link and provided connectivity, sorry if you could not use it properly. Please go pound sand and thank you for your business.” Only by using some cryptic command line interface was I able to enable their Internet operations to scale with their application and get the company some value for the money they were spending on connectivity.

Web 2.0 companies need to get a better understanding of the host entity that runs their business, the Internet. If not, they need to need to find someone that does, preferably someone they bring in at inception. Failing to do so will inevitably cost these companies users, performance and money.

79 Comments

Tom Davis

I agree with you on principle, but I think you take the idea a bit too far. For instance, I find it poor judgement for a start-up to run servers in their basement and deal directly with an ISP in the first place and the only time you would need to truly bone up on your router, bandwidth, etc. knowledge is if you’re doing that. My company uses unmanaged hosting, but the servers are still connected through a very high-quality, time-tested network staffed by people who have far more knowledge of networks than I ever wish to have. I need to know how to properly secure, backup, maintain, and setup web servers, but beyond that… the vast font of low-level networking knowledge is left to the experts.

Kevan

Very cool post, probably the single most interesting thing to hit my RSS reader this week. Thanks Mr. Leinwand. :)

David Mullings

Great post.

My questions are simple:

(1) How does a startup that has limited capital find a “router-hugging relics of the past century desperately clutching to their cryptic, SSH-enabled command line interfaces” willing to come on board as an adivsor until the money shows up?

(2) Where is a good source to find quality managed hosting service providers?

Thanks

Allan Leinwand

@bernardlunn – Agreed – don’t hire lots of infrastructure folks. But don’t expect to scale your Web2.0 application dramatically without using the services of someone who understand Internet operations.

@Emil – Of course you can do a startup without learning scary words like BGP, DDOS and SYN Cookies! But once you get out of the garage and want to make money you need to understand about Internet operations (or at least have someone around that does).

Emil

@Alan: You sound like one can’t go startup unless they know at least half scary words you just threw around. I’m sure startups can figure out infrastructure later. This is where people come to optimize it.

No way 2 guys in garage should bother about that. The company you’re writing about was probably too slow to resolve their issues.

bernardlunn

If Amazon, Google, Sun et al cannot figure this stuff out, then I doubt a little, underfunded start-up can. And if Amazon, Google, Sun et al cannot figure this stuff out, then there must be a great opportunity for entrepreneurs to add products/services to those hosting ecosystems to satisfy the real hunger to deal with “plumbing” as a totally outsourced variable cost. Sorry, hiring lots of infrastructure guys internally seems lile a retrograde step to me.

Allan Leinwand

@Andrew Mulheirn – Thanks for the comments. No, he won’t lose half of the Internet if a eBGP peer goes down as the routers are interconnected and share a default route via their IGP. More details offline if you’re interested :)

@SteveR – contact me offline and I’ll provide you with a few resources – they won’t be cheap….

SteveR

We have been lucky enough to slip under the radar and never get hit by a DDOS, even though we have 300,000 users a day. Is there a good outsourcing contact to handle DDOS attacks?

Andrew Mulheirn

D’oh.

Just noticed the “two routers” part of your article. Still, I guess my last suggestion (advertising half the IP space plus the whole IP space up each) still works…

Andrew Mulheirn

All of the above said – they’ve got a gigabit of outbound traffic, yet they’re using a single router and homed to only one provider edge (PE) router?

Sounds like they need more diversity than that – two datacentres, two routers, two providers would be my prescription…

Andrew Mulheirn

I like the article – thanks Allan.

I sometimes feel like we router-huggers are a bit like highway maintenance people – no-one cares when it is all working. What people don’t realise the level of maintenance that is going on in the background to keep them and the services they use online 24/7.

“I configured each router to only take half of the
prefixes from each upstream eBGP peer”

Was just thinking about this: I appreciate it is a temporary solution, but wouldn’t it be better to configure eBGP multihop (TTL=3) and peer between loopback addresses?

As it is currently configured, your friend will lose half of his IP space if one of the links goes down, won’t he?

Alternatively, advertise half up each gig link, and the whole block up each link as well. The provider will then route on longest prefix match when both links are up, but use the shorter prefix when one link is down.

Best regards, Andrew

ericabiz

Great post. It’s about time more people recognized the importance of a good hosting company. I used to run a hosting company and I could regale you with countless stories of complete ignorance on the part of customers… but what it typically comes down to is this:

“I know [insert some programming language here], so I don’t need to pay for managed services.”

“Managed services are too expensive; I don’t need them because my friend Bob knows this stuff.” (a parallel to the first quote)

So you sell these folks an unmanaged dedicated server, and then of course you get the screaming OH MY GOD MY RAID JUST FAILED AND THEN BOTH DRIVES DIED AND I DIDN’T HAVE ANY BACKUP AND MY ENTIRE SITE IS DOWN AND I AM LOSING THOUSANDS OF DOLLARS. AREN’T YOU SUPPOSED TO DO BACKUPS AND MONITOR MY DRIVES FOR ME??????

Um, no. I’m sorry to hear that, but it is an UNMANAGED server…

We lost customers with problems like that on a regular basis. Usually they went to another unmanaged host — setting the clock for when it would happen again.

The first comment is an excellent example of the sort of customers we got on a regular basis. Even Amazon has limits (200Mbit transfer limit per instance, for one.) Knowledgeable about tech. Think they know a lot about hosting. In reality, have no idea how to manage a server, keep it up to date, keep it from getting hacked, handle a DDoS, check that the RAID is operational, or run simple backups.

It got so bad at one point that I was seriously considering throwing in the towel on unmanaged services and going all super-high-end-managed (like Rackspace was smart enough to do.)

I am really glad I am out of the industry. I am much more sane now!

-Erica

Allan Leinwand

@John – I do think this was a DDOS attack as there were multiple source IPs. If you really want to know more details, let’s chat offline. On the BGP solution – I was waiting for the route-huggers to give me alternatives – I picked the most expedient fix given my lack of faith in the competence of the upstream ISP ;)

Allan Leinwand

@David Ulevitch – thanks. I did think my post last year was somewhat facetious and was struck by how many took it literally ;)

@elliotross & Daniel Golding – I agree that outsourcing your hosting makes sense – but you still need to understand how infrastructure works. In my second example, the 2 GigE links could have been from a colo cage at a hosting provider cross-connected to the ISP via a switch – and have the same result.

@Jeffrey – I look forward to hearing about your startup :)

John

Your description of how you thwarted a DDOS attack doesn’t make much sense.

As a router-hugging relic from days gone by, I know that if you had an actual DDOS attack, no amount of filtering of UDP high port numbered traffic, SYN-cookie detection, or ICMP filtering would have helped you.

Distributed denial of service is just that; Hundreds, if not thousands of hosts hitting your server at the same time. Eventually, the host falls down from load, and blocking at the router doesn’t help much because there’s too many hosts hitting the small pipe feeding your site. You have to take the blocking upstream to your provider and try to block there, where you’ve got a better chance at mitigating the load.

It sounds more like you had a basic DOS attack combined with a poor configuration and misconceived security.

I do appreciate your article, though. Too many people are reliant on Amazon to save them, or think that by setting up a single server in co-lo they’re going to be able to scale.

Also, one last thought: You’ve said: Fortunately, a temporary solution was relatively easy (I configured each router to only take half of the prefixes from each upstream eBGP peer) and worked with the ISP to give my friend some real routing diversity.

You’re partially right about this, but there are tricks for load balancing BGP that can be used.

Chinmay

Well… to put things simply; I don’t see the point of this article. Yes, while it is true that your friends faced problems connecting to their ISP’s backbone, how many people actually run their own servers unless the scale justifies it? (In other words, isn’t it really dumb to do that?)

You’ll more likely go with a VPS provider, and God willing, upgrade to some blade at Rackspace someday; both of which are *managed* (so you don’t have to configure BGP and punch holes in firewalls)

And, if the scale does justify it; you might run your own servers. I presume any company would hire the “router huggers” they should.

The Internet (and computing in general) have grown because of a clean separation of concerns. (To the router hugger: think of the layered TCP/IP architecture.)

To the others: why stop at hugging a router? Isn’t electricity a part of “the host infrastructure that your business relies on”? Why not learn all about lead-acid battery I-V characteristics? They will be useful once power to your server room fails!

Daniel Golding

This is why you should use a hosting company. The idea of any web 2.0 startup hosting internally on a T-1, DSL, or Ethernet loop, is ludicrous. You need multihomed, reliable, and scalable bandwidth. The point about EC2/S3 in the first comment is certainly simplistic, but there is a certain truth in the idea that this should not be the web 2.0 company’s problem.

Most reasonably sized managed hosting firms have crack network engineering teams that understand BGP and Internet architecture quite well. They order Internet transit in 10 gigabit chunks and the largest also peer at Internet Exchange Points (IXPs). The idea that “network engineering is dead” is foolish – network engineering is alive and kicking. Its just that network engineering has become professionalized and is no longer the realm of “Jim, the Sysadmin, who knows Cisco” – Jim never really knew “Cisco”, and he always did a margin job. Real network engineers works for carriers, hosting companies, and CDNs, as well as large financials.

There is no way on earth that you’ll get real understanding of Internet architecture at Web 2.0 firms. I appreciate the sentiment – it is important – but leaving the underlying infrastructure to hosting providers is the way to go.

elliotross

A copy of every Cisco book won’t help you. True router CLI huggers are important – but expensive. Are you going to hire one for a few hours work per month? A few minutes with a Cisco text will not make even the most die hard coder a BGP expert (let alone bonding links)

That should be the outsourced domain of the ISP or other 3rd party. If the ISP was a little more aware – that would be a service that they would provide. They have the staff and 24×7 operations to manage it properly – rather than just shipping a box for the customer to plug in.

David Ulevitch

Allan,

I’m glad you’ve realized your post about the “death of the network engineer” was highly exaggerated (and totally ridiculous).

This post, however, is great. :-)

Jeffrey

There’s a meta issue at work here and it has to do with the kinds of activities that are typically recognized and rewarded in technology companies, particularly startups. Product releases and (in some cases) sales are everything. Tactical execution, on the other hand, isn’t recognized much at all — not by investors, executives, or users.

It is quite unsurprising that this would happen in a world in which most VCs will only fund companies founded by kids who are only a few years out of college with little or no operational experience in running a web site.

James

On a far smaller scale, I just discovered a small startup (non-IT – medical devices, to be specific) which I support has been paying $140 per month for 2Mbps DSL. Expensive? Well, it would be: most of that money was actually for the bundled webhosting, which they had never used or indeed known they had!

Allan: Absolutely. Emil, maybe your startup scales well, so that as your traffic builds up you can ratchet up through 10, 100 Amazon servers – but sooner or later, it’ll come back to bite you, either when you hit a bottleneck you hadn’t spotted or when someone else more efficient comes along and eats your lunch with a quarter of your costs for the same service!

I’ve always felt that trying to build any kind of Internet service without understanding the structure you’re building on is a bad idea. There are things you should bear in mind which you simply won’t understand otherwise – as in this case: why two separate peer links to a single ISP, rather than dual-homing (connecting to two ISPs) or a simple bonded link between the two routers? Maybe in this particular case there were good reasons for this particular setup, but the company should have had someone thinking this sort of thing through before spending lots of money committing to one option!

I’ve seen painfully slow solutions built on what should be a lightning-fast CDN, thanks to poor implementation – and far faster sites on one small server on the far side of the planet with a well-tuned setup. You can all guess which one cost more – and yet it’s the other one which provided the better experience for end-users!

Allan Leinwand

@dave – I’m not an investor in either company that I mentioned. So, for the company that floundered, I did my best to get them a consultant in a timely manner and it’s now up to them. Also, please don’t get me wrong – I think that moving services to the cloud can be the right way to go for some Web2.0 applications, but when you don’t understand the basics of the technology that allows your business to operate, well….

Has Web2.0 really killed the network engineer as I wrote about last year? http://gigaom.com/2007/04/10/web-20-death-of-the-network-engineer/

dave

allan, unfortunately not every startup has the co-author of a cisco router book on board with real tech chops still working (even the most tech savvy vc’s i’ve seen haven’t touched code in at least 10 years, sorry)…but per emil, this is where the opportunity gets real for amazon, cloudfs and others – of course my real question for you is what are you gonna do with that first company that floundered and lost traction after the access debacle?

there’s no cure for this stuff. startups in this space would be well served to spend a little more time following the activities of the IETF and examine how they’re thinking about these problems in a more vendor-neutral way…

Allan Leinwand

@Emil – thank you for articulating my point in two words :) If you think relying on Amazon’s outsourced infrastructure will enable you to build a highly scalable Web2.0 application without any knowledge of Internet operations then I predict your business will encounter Internet operations issues and cost you more money than you realize as you scale. Don’t get me wrong – Amazon runs a good operation – but a lack of understanding of the host infrastructure that your business relies on to make money is going to be an issue. What happens when your employees sitting in an office in Indiana have connectivity issues connecting to Amazon’s service when you launch your service this month?

@Jon – thanks.

Jon

“it’s high time you start looking for a router-hugger, or soon your site will be loading as slowly as AOL over a 19.2 Kbps modem.”

great line.

Comments are closed.