Blog Post

Even in Web 2.0 Scale & Size Matter

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Yet another update at the bottom of the page

Updated Many times in the past I have pointed out that most of the new Web 2.0 type start-ups have business plans that only go so far! I am actually glad to see Jeremy Wright take a stand on this issue.

I’ve spent about 10 hours this month working with really, really high profile Web 2.0 ish companies nearly yelling at them about their lack of true infrastructure. If your business depends on your website being up, look at your code, look at your infrastructure and for your users sake figure out what you actually need and build the damn thing properly!

I think most start-ups actually don’t take that into account, and pay the price as they grow really big really fast. Early hiccups at TypePad were a case in point. Having had the financial resources, Six Apart has largely overcome those problems. Others, are not thinking along those lines, because we are in the early stages of what might be a bubble, as per Tristan Louis.

During a bubble, attention to boring details like capacity planning, infrastructure management, security, etc… sometimes take a back seat to new feature introduction. Those, however, are a substantial portion of what makes a company survive.

As part of my virtual visit to Les Blogs, I recorded a video which discussed these issues briefly. What worries me the most is “the me-too nature of the companies,” features posing as products and eventually companies and what not. These are signs of what could be excessive froth, as Louis points out. (Steven Borsch has a good post on this as well!)

However, the lack of planning for scale is a clear sign that we are living in a “built to flip” age. No one, is thinking (or planning) about long term business models!

Updated 37signals’ David Heinemeier Hansson offers an alternate point of view, though I suspect he is concluding “Planning and architecting for the future” with 99.999% uptime. He says spending $3 million on servers in Web 1.o was sign of a bubble. I say, not having a long term plan, including a scalable architecture shows, that you are waiting to be bought out. I think the issue is not about blowing millions on servers, which in these days of ultracommoditizaion is difficult, but the issue is – what is your end goal. Ramana Kovi has some thoughts, and I wish he would elaborate further.

Further thoughts from Steven Borsch: “It’s not just servers and bandwidth that are required for scale. It’s dealing with latency over an increasingly fragmented and geographically disbursed base of people consuming web applications.”

More thoughtful entries from Danny Ayers and The Stalwart, on this subject. Danny disagrees and The Stalwart agrees. Danny Says:

Definitions of Web 2.0 vary, but I think two key aspects are The Web as Platform and The Architecture of Participation. For applications that exploit these paradigms to function, there must certainly be some forward-looking design locally, for example in setting up a distributed database on the company’s servers.

But Danny isn’t that what I was saying? People are thinking short term, and thus not taking into account what you so eloquently write.

58 Responses to “Even in Web 2.0 Scale & Size Matter”

  1. Let’s back up and first ask “What IS Web 2.0?”

    It turns out that “Web 2.0” is hype and that the correct way of achieving scale is what preceded “Web 2.0”, namely the principles of REST (REpresentational State Transfer) which were used to design the WWW:

    Do scale and size matter? Certainly if you’re Google. Probably not if you have little traffic.

    But in any case all discussion of “Web 2.0” is pure advertising and a waste of time. Each vendor selling “Web 2.0” has a different definition of it and none is worth buying. Best to let these firms die while they’re young!

  2. Fact is, services do not fail becauce their opgrators can’t sccle them. They faile due to the inability to attract paying customers. In the scheme of things, scaling a service is relatively easy but getting customers and making money is very difficult.

  3. Jim Dermitt

    Sony 2.0|666 “built to flipâ€?
    SunComm’s MediaMax? That other copy protection system Sony BMG has shipped on some of its music CDs in an effort to cut down on piracy?
    It can cost you thousands or even millions of dollars. MediaMax is a different copy protection system than the “rootkitâ€? DRM that has been drawing all the attention.

    Sony opens up over another CD security hole, the Register reported today.
    “other severe problems with MediaMax discs, including: undisclosed communications with servers Sony controls… undisclosed installation of over 18 MB of software regardless of whether the user agrees to the End User License Agreement; and failure to include an uninstaller with the CD.â€?
    The Texas Attorney General reportedly filed filed the lawsuit on Monday against Sony BMG Music Entertainment under Anti-Spyware laws.

    RIAA thought the problem was the user. This seems like a real problem. I wonder if Sony will have some downtime.

  4. Jason Friend thinks To go from 98% to 99.9% uptime costs tens of thousands dollars.

    This is again another misconception in scalability arena. System and Network vendors want you to design with load balancing and clustering but best solution to the uptime and scalability problem is to design a distributed application with partition of data into multiple databases. Basic question is “Do you want scale by infrastructure or scale by application design through distributed computing? If you are an on-line business, you must drive to high scalability and uptime. If you approach scalability from clustering and load balancing, yes it can cost you thousands or even millions of dollars. Take a online social networking website as an example, It is lot easy to design SQL/JOIN in a single database to find relationships but when data gets large, single database becomes a bottleneck. If you want to scale, spread your data onto multiple databases and design multi-database query engine to cross-join across multiple database servers. Recently SAP bought Callixa to do just that. Link

    Hope that helps
    Ramana Kovi

  5. Number of servers is not an issue, It how efficiently you can scale you application is more important. e.g.
    Google search server farm has over 400,000 servers with very little clustering, they do scale don’t they.

    I don’t get it.
    Way too technical for me I guess.
    I bet they have a bunch of code I’d never figure out. 400,000 is a lot of servers. I wonder what the electric bill is. Once we have more wifi, we’ll need less server farming just to find stuff on other servers. Maybe you’ll just be able store stuff in the air. More room to grow corn and spuds.

  6. Clustering and load balancing are infrastructure scalability solutions not an application scalability solution. Clustering and load balancing does provide high availability but may not provide much in scalability. Scalability and security must get designed into the applications. Number of servers is not an issue, It how efficiently you can scale you application is more important. e.g.

    1. Google search server farm has over 400,000 servers with very little clustering, they do scale don’t they.
    2. Inkotomi got buried by Google because they used clustering and load balancing instead of distributed application architecture.

    Ramana Kovi

  7. I know everyone has already chimed in on this, but seeing as this is something I deal w/alot, I figured I would take a stab at it. One of the main reasons that web2.0 companies don’t scale is that the guys writing the code have no experience w/mySQL clusters and other enterprise level services (something relatively few people do). The second is that for someone giving away their nifty Ajax product to pay through the nose for extra unused servers is nuts. Unless you have a revenue stream (which most don’t), you need to focus more on your business model/plan than whether or not your 40 load balanced Xeon servers are clustering.

    If you aren’t running a service like basecamp, then it makes sense to stagger your user-growth (its not like having a million users and no one getting access is going to make you money if you give everything away for free).

    BTW finding good resources on clustering/redundancy is easier said than done. Unless you’ve got an IT degree, most of it is incomprehensible.

  8. A funny story.
    The local news folks have been running stories about the new xbox 360. Some of the stories are about overheating and crashing problems and ask if you should wait or buy an xbox 360 this Christmas. The problems seem pretty isolated and the thing has a strong warranty, so it doesn’t really matter. The big story is that you can’t find the things in a local store at list price. I found one for $1,300.00 online. I’m not buying one because it could be defective. It’s the sticker shock I can’t handle.

    I saw a 360 ad on TV last night. I think they are spending the budget on marketing while manufacturing has some problems scaling. Microsoft has huge financial resources and you can’t find their hot new product. Size doesn’t always matter. Xbox 360 is a big deal. You would think that they could make them fast enough to live up to the marketing hype. The thing would almost sell itself if it wasn’t missing in action.
    They could of made a ton of cash off of the xbox 360. They spent a ton of it on marketing with store kiosks and the works. It was a big deal.

  9. Walmart has the credit card industry supporting their card processing network. They don’t need to worry about it going down. It’s redundant, secure and scalable beyond just about anything. The average person can do the same thing as Walmart from a desktop with Paypal or a similar payment website.

    There’s always competition. With no inventory costs, you can price things lower than Walmart. That’s why people shop online. Shipping is certainly cheap enough, but UPS scales and wants your business. You can always work or shop at Walmart if that’s your kind of gig. The idea of less competition in a network environment seems kind of off base.

  10. A sw developer’s perspective:
    In software development, there is unwritten
    rule not to worry about performance/scalability
    at the beginning because its easy to
    overengineer. Usually, performance/scalability
    problems are welcome because it means that the
    software/service is becoming popular.
    That said, this also differentiates great
    developers from the average — their design
    is simple, robust and scalable, eg — unix

  11. Jason is wrong about uptime/downtime. If Wal-Mart is down for half an hour I’ll come back later, because there’s really no competition at those prices. If Technorati is down I’ll immediately start looking elsewhere, maybe Google blogsearch, Yahoo blogsearch, Sphere, PubSub, Feedster, etc. There’s not shortage of alternative to try out with absolutely $0.00 switching costs. To make matters worse, I might even like one of those other search engines, and remain a loyal user forever.

    Even if I went to a Wal-Mart competitor for a day, it’s unlikely I’d be a permanently lost competitor.

  12. I would still say, scalability should start with code design. Things like database slave replication (mySQL seems to support this fine and well), faster code invocation like say FastCGI way, lesser libraries and lesser code bloat, and extensible code design; is your cheapest and easiest ticket to a scalable web service.

    Immediately investing on a lot of hardware at the start, is just overkill, no it’s suicide. As Jason F mentioned, why invest on something that’s not yet there?

    I posted my dev rants on making a scalable web app on my website.

  13. Part of the reason scalability matters is that the web IS the business for most Web 2.0 companies. As a result, a failure to scale will become a major impediment to growth and could result in a growth curve that would deflate. Technorati and SixApart nearly averted disaster but it cost them in reputation (in Technorati’s case) and real dollars (in the case of SixApart). Google Analytics got mostly bad word of mouth because its systems failed to scale… etc, etc… In Web 1.0, we made some of the mistakes too but on the other side, by overestimating demand, which is almost as bad as it forces you to spend more than you should on your infrastructure.

  14. Jim Dermitt

    Paul Graham asks Does “Web 2.0” mean anything?
    This is at
    “The conference itself didn’t seem very grassroots. It cost $2800, so the only people who could afford to go were VCs and people from big companies.”
    The whole thing is worth reading. I thought it was anyway. Web 2.0 had a $2800 toll booth, which eliminates a great deal of talent from being involved. Personally I’d rather spend $2800 on other stuff. From the 2.0 site “Why attend? The Internet is a critical component of the strategy and infrastructure of every successful company today. At its most disruptive, it redefines markets and creates entirely new opportunities. More than 50 thought leaders and entrepreneurs are slated to present in an interactive format stressing audience participation.”

    For $2800 you can be a Web 2.0 thought leader. Follow the thought leader or follow your own lead.
    If you missed Web 2.0, there’s always something new to explore. Web 2.0 is over. We are now at Web 2.1

    Web 2.1: A BrainJam for the rest of us
    “The event raised over $1,000 for the Internet Archive and we are donating more than $100 to the Creative Commons fundraising drive. In our efforts to be a transparent organization, we posted an overview of the event financials for everyone to see who is interested. My personal wrap up is here and you can still visit the event Wiki, the original event site and the Insytes Blog for more details.”

  15. Jim Dermitt

    All of the freeware on this page is unsupported by SGI.

    A better way to go could be that all freeware is supported. SGI could easily offer a support link and charge a small fee for tech help with the freeware. SGI could also provide links to businesses that do support this freeware. The little company running freeware today, could be the Google of the future.

  16. How many companies actually have scalability problems that is hardware related? 99% of the time the problem is the company has clueless developers who don’t know how to scale a product and no amount of money you toss at it will change it.

    Before i built my site i worked at a good many companies during the end of the bubble years. It was amazing sitting down and tuning the database for 20 minutes and you get rid of the need of 80% of the infustructure.

    I built my site and it gets 10 million pageviews a day now. Top 10 site in canada in terms of pageviews and #83 on monday in the USA according to The entire site has 1 web server, and 1 DB server doing 99% of the work. Site also has a mail server, and 1 image server.

    I run the only “web 2.0” company in my space and every single competitor my size has around 200 servers and a support staff of 20-40 people. I just don’t get it. In short if you have scalability issues and less then 5 million pageviews a day you’ve got issues with the knowledge level of your staff.

  17. At PodTech when I started the company all my dollars went into infrastructure (no money went into marketing). The fact that PodTech never went down was the marketing. When podcasters were dropping like flies when iTunes came out and then Yahoo Podcasts PodTech stood proud. As a small company that was our proudest moment. High availability and clustering are keys if your business is about 2.0.

  18. We’ve been dealing with these issues quite closely with Akismet and The best advice I can give someone starting from scratch is to design for federated data if possible with your application. This is the single best decision we made with

    Second, high quality hardware and load balancing setups are so cheap these days that it’s really criminal not to invest in it if you care about your users. I would never want to be running an app where users didn’t care if it was down for a couple of minutes a day, or slow.

    We’ve made mistakes, but you don’t have to: replicate your database, backup several times a day, always have a hot spare, keep DNS timeouts low, and TEST TEST TEST. There are open source tools today that do the exact same thing the 50k+ load balancers do.

    I’m on a budget, but I’m incredibly thankful that (a) so many users want to use our service that scalability is something to worry about and (b) we’ve invested with heavy spikes and growth in mind. A few months ago if a hard drive burned out the service just would have been down, likely for at least half a day. When it happened a few days ago it just meant a slowdown until we were able to shift our application load.

  19. Yeah. Totally. Scalability is the new ‘blog’.

    This is ALL I did at Rojo. Scalability 24/7… MySQL cluster design, distributed filesystem design, memcached, caching, web app performance.

    BTW if anyone needs a kickass scalability consultant they should send me a private email :)


  20. i think you folks are missing the point. i think ramana is spot on. it is the right architecture, and ability to think through the what if problem. it is not throwing servers which is an issue, it is an issue of building a scalable architecture. and that a lot of companies are not thinking through.

  21. Jason is right on. There is little reason to spend much energy on “getting scalability right” at the outset. The energy is much better spent on figuring out how to get customers and make them pay. Scalability is fairly easy to provide when necessary. Most startups would kill to have scalability problems!!

  22. Scalability not just about how to add another box to the rack and let DNS round robin scheme take over the scalability. Scalability must be designed into application architecture. If scalability designed correctly into the applications there are hugh financial savings you can reap. Distributed Multi-Database SQL queries “is a must” for any web service application who want to scale.

  23. I write OSS server automation software, and I was at the Web 2.0 in hopes of learning how to apply some of these nifty ideas to my own development, and even though I went to learn I ended up feeling a lot more like a vendor. Very few people had any kind of infrastructure or really any idea how to successfully run web services.

    People need to realize that managing and scaling the services are different skills than creating them; either develop both skills or get to hiring, but don’t think that those machines will just run themselves.

  24. slashdoc, i think theissue is pretty clear… if you are going to be doing business, well, then you might as well prepare for the eventual issues of scale and size. the fact is google analytics did not scale, is their bad planning and inexcusable. it is even more critical for start-ups to capture the users who come their way. of course they can stay in controlled beta for as long as they want. sphere is clearly doing that.

  25. I have to disagree. While it’s embarassing for Google to be overwhelmed by Analytics users, I think there is much more risk for small web 2.0 “companies” investing tens of thousands of dollars in infrastructure before figuring out how to turn a profit. That’s more like Bubble 1.0 if anything.

    Then again, I’m not quite sure what the business model behind companies like really is, other than hoping to be bought before burning through the VC money? That might be the real problem.

  26. This infrastructure discussion is the “dirty little secret” of all the hyperbole surrounding Web 2.0. Just wait until there is global scaling required!

    I did a post about this dirty little secret right after the Web 2.0 Conference since this missing part of the discussion was so glaringly left out.