22 Comments

Summary:

Common scalability-related failures include thinking scalability is just about technology, inappropriate use of databases, and more. Here are the 10 most common scalability killers that we see, particularly in early stage startups — and the ones we believe are the most important to avoid.

Compare the recent sale of Friendster for a reported $26.4 million with Facebook’s projected 2010 revenues, of $1 billion, and we have a stark reminder of how the inability to scale can kill a startup. “All they had to do was keep the damned servers up and running,” Matt Cohler, a former Facebook executive and general partner at Benchmark Capital, says in Adam L. Peneberg’s book “Viral Loop,” but Friendster failed to scale and the cost was enormous.

So what should Internet startups avoid in order to grow? As former tech executives and consultants to hundreds of startups, we’ve seen how some companies scale and others fail, and we’ve assembled this knowledge in our recently released book “The Art of Scalability.” Take a look at our list of the top 10 scalability killers.

1. Thinking Scalability Is Just About Technology

This is really the reason we wrote our book. We started our firm as a consulting company focused on helping companies scale their technical platforms. Soon we realized that we were finding as many problems with organizations and processes as with technology. People ultimately are the ones who make mistakes in designing systems or overlook certain design elements that would allow a system to scale cost-effectively. Experience and culture are critically important in designing systems to scale.

2. Overuse of Synchronous Calls

This really shouldn’t come as a surprise to anyone familiar with scalable Internet architectures, but we still find an overabundance of synchronous calls within architectures. There are times when you need a synchronous call or when the development of an asynchronous solution will take too much time. However, it’s important that you build the right questions into your development processes to challenge synchronous implementations early.

3. Failure to Weed or Seed Soon Enough

We’ve written about how to hire, fire and mentor and why to remove underperformers quickly for superior teams. Our message is simply that you can never eliminate underperformers soon enough and that you should always be looking for superior talent. Superior people make excellent technology and develop appropriate processes.

4. Inappropriate Use of Databases

Databases are expensive and often monolithic nightmares of congestion that create single points of failure for architectures. Use them when you need to rely on the ACID properties of a database to resolve issues of consistency, isolation and durability during high transaction read and write conflicts. If you’re simply writing something once and reading it many times, as is often the case with pictures and PDF files, store them in less costly infrastructure alternatives.

5. Cesspools Instead of Swim Lanes

Network architectures have long had the notion of fault isolation through collision domains. Scalable Internet architectures should have fault isolation such that failures in certain components don’t impact other zones of functionality. We refer to these fault isolation zones as “swim lanes.”

6. Reliance on Vertical Scale

This raises its ugly head in many of our engagements, especially in early stage companies. You should almost always design for horizontal scale. Certainly there are times when you feel growth in any given area will be minimal and in fact that such small growth might be more cost-effectively served by a vertical scale. Such a financial decision can be sound and appropriate. But when you believe you’ll grow aggressively in any given area, you should design your architecture to allow you to be in control of your own destiny through horizontal scale.

7. Failure to Learn from History

Santayana’s Repetitive Consequences, “Those who cannot learn from history are doomed to repeat it,” is true of young technical organizations as well. In the engineering and operations world, an inability to look to the past and find the most commonly repeated mistakes is a failure to maximize shareholder value and grounds for dismissal. The best and easiest way to improve our future performance is to track our past failures, gather them into groups of causation, and treat the root cause rather than the symptoms. Perform post mortems of projects and site incidents and review them quarterly for themes.

8. Changing Development Methodologies to Fix Problems

CIOs and CTOs see repeated problems such as missing dates or dissatisfied customers and blame their product development life cycle (PDLC). Often they too quickly move to change the process without addressing root causes. A lack of involvement from the business tops the list of problems. In the Scrum model there needs to be consistent involvement from the business or product owner. Another common problem is an incomplete understanding or training on the existing methodology. Everyone in the organization should have a working knowledge of the entire process and their roles. Change the PDLC if there are valid reasons such as a better cultural fit, but don’t alter it before addressing the core issues.

9. Too Little Caching, Too Late

Caching is your friend. If you’re writing once and reading often, and if that data has a common usage pattern, you should make aggressive use of caching. Consider content delivery networks outside of your facilities and inside your network page, image, object and application caches and any other cache solution you can find!

10. Overreliance on Third Parties to Scale

Every vendor has a quick fix for your scale issues. If you’re a hyper-growth SaaS site, or hope to be, you don’t want to be locked into a vendor for your future business viability. You want to make sure that your site’s scalability is built into your architecture, not your technology. This isn’t to say that after you design your system to scale horizontally you won’t rely upon some technology to help you, such as the caching solutions discussed above. Once you define how you can horizontally scale your database and application, you may want to use any of a number of different commodity systems to meet your needs.

Marty Abbott and Michael Fisher are partners with AKF Partners.

Image courtesy of Flickr user Italianjob17.

  1. Must read article!

    Share
  2. Very important tips on scalability. If anyone has more oops, aha and hmm experience on scalability, share your #lbl

    http://littlebookoflearning.com/site-scalability/

    Share
  3. Just took two print-outs (one for my desk and for home-office). Must read commandments..

    Share
  4. Friendster’s fatal mistake was deciding to re-write rather than scaling it’s existing infrastructure.

    Share
  5. Great article!

    I’m very happy that we didn’t failed when we needed to scale with mytaskhelper.com
    It was a very little startup few years ago and then people liked it and there was a boost up to 10000 times, a lot of improvements were done.

    But remember to think about all of this BEFORE you start your next Big project. Some project just can’t be scalled after a year of development or so.

    Thanks,
    Igor
    CEO at mytaskhelper.com

    Share
  6. Brilliant post, thank you.

    Share
  7. Such a timely article! My former employed made ALL of these mistakes, and it’s costing them dollars and customers.

    I run a blog about scalability issues, check it out at http://www.roadtofailure.com : I have articles such as “Social Media Kills the Database” :)

    Share
  8. Scalability killer # 10 is going to get you in trouble with the cloud police.

    Share
  9. excellent article – and reminds me of some of the very same things that have come up (for example) in detailed drupal case studies when major firms are willing to explain how they use drupal to build and scale for volume like whitehouse.gov (now on drupal), fastcompany.com, popsugar (and all entities)…though they don’t quite hit on this level of insight (particularly as it relates to teams, learning from history etc)…you guys are great (but i already knew that from robin ;)

    Share
  10. [...] Top 10 Internet Startup Scalability Killers Compare the recent sale of Friendster for a reported $26.4 million with Facebook’s projected 2010 revenues, of $1 billion, and we have a stark reminder of how the inability to scale can kill a startup. “All they had to do was keep the damned servers up and running,” Matt Cohler, a former Facebook executive and general partner at Benchmark Capital, says in Adam L. Peneberg’s book “Viral Loop,” but Friendster failed to scale and the cost was enormous. Gigaom [...]

    Share

Comments have been disabled for this post