Even in Web 2.0 Scale & Size Matter

58 Comments

Yet another update at the bottom of the page

Updated Many times in the past I have pointed out that most of the new Web 2.0 type start-ups have business plans that only go so far! I am actually glad to see Jeremy Wright take a stand on this issue.

I’ve spent about 10 hours this month working with really, really high profile Web 2.0 ish companies nearly yelling at them about their lack of true infrastructure. If your business depends on your website being up, look at your code, look at your infrastructure and for your users sake figure out what you actually need and build the damn thing properly!


I think most start-ups actually don’t take that into account, and pay the price as they grow really big really fast. Early hiccups at TypePad were a case in point. Having had the financial resources, Six Apart has largely overcome those problems. Others, are not thinking along those lines, because we are in the early stages of what might be a bubble, as per Tristan Louis.

During a bubble, attention to boring details like capacity planning, infrastructure management, security, etc… sometimes take a back seat to new feature introduction. Those, however, are a substantial portion of what makes a company survive.

As part of my virtual visit to Les Blogs, I recorded a video which discussed these issues briefly. What worries me the most is “the me-too nature of the companies,” features posing as products and eventually companies and what not. These are signs of what could be excessive froth, as Louis points out. (Steven Borsch has a good post on this as well!)

However, the lack of planning for scale is a clear sign that we are living in a “built to flip” age. No one, is thinking (or planning) about long term business models!

Updated 37signals’ David Heinemeier Hansson offers an alternate point of view, though I suspect he is concluding “Planning and architecting for the future” with 99.999% uptime. He says spending $3 million on servers in Web 1.o was sign of a bubble. I say, not having a long term plan, including a scalable architecture shows, that you are waiting to be bought out. I think the issue is not about blowing millions on servers, which in these days of ultracommoditizaion is difficult, but the issue is – what is your end goal. Ramana Kovi has some thoughts, and I wish he would elaborate further.

Further thoughts from Steven Borsch: “It’s not just servers and bandwidth that are required for scale. It’s dealing with latency over an increasingly fragmented and geographically disbursed base of people consuming web applications.”

More thoughtful entries from Danny Ayers and The Stalwart, on this subject. Danny disagrees and The Stalwart agrees. Danny Says:

Definitions of Web 2.0 vary, but I think two key aspects are The Web as Platform and The Architecture of Participation. For applications that exploit these paradigms to function, there must certainly be some forward-looking design locally, for example in setting up a distributed database on the company’s servers.

But Danny isn’t that what I was saying? People are thinking short term, and thus not taking into account what you so eloquently write.

58 Comments

Tom Lee

You need to do all the ‘home work’ before you start up your business on the web, if you fail to plan you plan to fail…

Robert Eckert

Brilliant blog you have .I like the comments and topics you discuss here.Although this is not the information,I was hoping to find with my search.I believe it’s great when you come across a genuine subject that makes sense.Good luck in all your endeavors.If you have the chance.Maybe you could stop by my new web site.How to get free internet advertising.

note

Just read through the post.
“1) Hire really freaking smart people who won’t do retarded things
2) keep your architecture as simple as possible. add complexity when you actually need it.
3) optimize when needed.

It’s the way.

note

Just reading through the post,”
1) Hire really freaking smart people who won’t do retarded things
2) keep your architecture as simple as possible. add complexity when you actually need it.
3) optimize when needed.”
I think this is the way.

strategic planning software

a.. Tis the season! I was searching the web and found your entry . I really like your site and found it worth time reading through the post. I am looking to publish a comprehensive site ranges many types of historical needlework. All those interested in this area will find this article of interest as it is written from many perspective. Please feel free to take a look at my blog at process strategic planning and add any thing your want.

RAA

I doubt any startup company ever looks forward to being the next web icon, especially in the face of viscious competition. Traffic doesn’t spike overnight. You accommodate the demand as the need arises. For example, what sense does it make to write code that handles a cluster of servers, when you only get a few hundred visits a day in the beginning?

phil swenson

“i think you folks are missing the point. i think ramana is spot on. it is the right architecture, and ability to think through the what if problem. it is not throwing servers which is an issue, it is an issue of building a scalable architecture. and that a lot of companies are not thinking through.”

Spoken like an old school architect. I suggest reading about agile methodolgies and the shared nothing architecture. Architects almost always get it wrong. Requirements (and possibly the entire direction of the business) change and the architecture often becomes obsolete… so all the up front work is wasted.

I remember the old days walking around the Exodus data centers looking at the millions of dollars of equipment (Sun E10Ks, etc) for these startups that had zero users. Most of these companies are dead now. Maybe some of these companies would still be around if they didn’t shoot their VC load on “building a scalable architecture” for an empty, zero traffic site. Perhaps they should have honed in on a good service that provided real value…

here’s my take:
1) Hire really freaking smart people who won’t do retarded things
2) keep your architecture as simple as possible. add complexity when you actually need it.
3) optimize when needed.

If you have good coders and keep it simple, you’ll be able to make the changes for unforseen business requirements/load.

Nik Cubrilovic

At Omnidrive we are preparing for our public release by having both a large hardware supplier and a hosting company that owns co-lo centers as both early investors and members of either our board and advisory committee.

With a good business plan and product companies in these industries are open to such involvement – I can’t believe that it doesn’t happen more often with web 2.0 companies considering that supply of hardware and rackspace are very important parts of the business.

I don’t agree with the 37 signals approach and nor do I believe that close to 100% uptime costs hundreds of thousands of dollars (actually, I know it doesn’t).

Dennis Howlett

Ironic as the URL is an MT-TP account so who knows what you’ll see.

I believe today’s 6A outtage has lit a fuse on a gasoline soaked segment of the high tech industry. One that has been doing a lot of good but which has been shown up for the immature child it really is.

That immaturity manifests itself in a failure to recognise the scaling issue. It is why the IBMs of this world earn billions of dollars, trying to hold this fragile beast we call the Internet together. For the sake of those in the real world that have to earn money buying and selling real goods and services that people physically consume in their real lives. Not virtually.

This time around, large corporations that were tettering on the edge of immersing themselves in this ‘stuff’ may well shrink back.

If I’m remotely correct, it will be a sad day.

Cedric

Scalibility is neither a web 2.0 nor a web 1.0 problem. It is a basic business issue.

Companies can only grow their profits either by increasing their revenue or decreasing their costs. WebServers are a fixed costs and as more users join a service they can quickly absorb all the companies capital.

One alternative is to look at a different architecture where the processing power and disk space is distributed instead of centralised. And I’m not talking Ajax here but a truly distributed architecture using the web as a platform and the user’s machine for processing and hosting data.

That’s what we do at AllPeers which means that we can scale infinitly since each new user joining brings his computer power with him.

Jeff Clavier

Les blogs’ website is actually lesblogs.typepad.com.

One of the issues Web 2.0 companies have in building solutions on the cheap is that they don’t plan for real scalability of their infrastructure. Which means that they melt down whenever their traffic/audience grows faster than their ability to add servers/gears. And at some point, the whole thing needs to be rebuilt anyway to plan for the next stage in scalability.

Mat Atkinson

Common sense – 1. Plan to scale 2. Get users 3. Scale.

The key seems to be to factor scaleability into your initial planning. Doesn’t mean that you have to get the chequebook out until you need to.

The hardest step is step 2. Creating a business that drives customers and revenue is the toughest nut to crack. Step 1 needs to be done well, but is not where the business risk truly lies.

Markus

I see what your talking about now. Hypercubes and etc are 10’s of millions of dollars. The one that Lord of the rings was rendered on AMD basically gave them the hardware for free for bragging rights as is commonly done by both intel and AMD.

I thought you’d ment this was a actual solution for web 2.0 companies, but your talking about massive corperations doing massive data processing and not a transaction processing environment. In otherwords your starting with a fixed data set and rendering something out of it. On a website your working with a dataset that is constantly changing.

I was part of a team a few years ago that went in and optimized the worlds largest commerical SQL server implimention at the time according to microsoft. The hypercubes work great for math problems or rendering of movies etc but fail miserablely when it comes to online multithreaded transaction processing. The main reason being that no company has spent hundreds of millions trying to build a database server. Ie oracle mysql and sql server are the only products with the features required for OLTP.

Ramana Kovi

That is very fast read. I am not sure what you are read, It is called “multi-join queries in PARALLEL RELATIONAL databases”. The paper talks about queries across PARALLEL RELATIONAL databases. i.e multiple database servers. It is upto you where to place databases servers, in a single computer under single OS or spread across multiple computers (CPUs).

Good luck with your approch. bye!

Markus

I read that paper and it only talks about distributing a query across multipul CPU’s within a SINGLE server. This is how every DB on the market currently functions, and every db has options which limit how many cpu’s a query can use ram etc.

As far as i know there isn’t a single database out there that recommends you to run queries between machines because there is no way to have it scale. The OS needs to lock the pages in ram and maintain index locks for queries.

The articles you reference talk about a computer with N drives and X cpu’s all running under a single operating system, basically a custom built super computer.

No web 2.0 company is going to spend 20 million on a super computer when they can cluster together $5,000 off the shelf PC’s and have a great database cluster.

Ramana Kovi

Marcus,

Here are couple of good papers from ACM/IEEE transactions

P.M.G. Apers, A.R. Hevener, and S.B. Yao, “Optimization algorithms for distributed queries,” IEEE Transactions on Software Engineering, vol. 9, no. 1, 1983.

Jaideep Srivastava, Optimizing multi-join queries in parallel relational databases, Proceedings of the second international conference on Parallel and distributed information systems, January 1993, San Diego, California, United States

Lot of reasearch went in this area in last 25 years.

Regarding real life examples ask any body who is running terabyte size databases and with directed graph problems. IC Design Placement and Routing, Web Search Engines, Security agencies are good places to start.

Thrashing has nothing to do with multi-database querie engine. This is a network issue waiting for query returns from multiple database servers.

Markus

“If you want to scale, spread your data onto multiple databases and design multi-database query engine to cross-join across multiple database servers.”

That would severly degrade your performance due to blocking and the whole thing would crash quicker then if you had it on a single machine.

The only thing that works at that level is to replicate part of the database to another server and then run queries off that. I’ve never heard of a single major company attempting to chain together DB servers and then run queries across them. Each major social networking company or dating site replicates the DB to other servers and runs queries against it.

Comments are closed.