Compare the recent sale of Friendster for a reported $26.4 million with Facebook’s projected 2010 revenues, of $1 billion, and we have a stark reminder of how the inability to scale can kill a startup. “All they had to do was keep the damned servers up and running,” Matt Cohler, a former Facebook executive and general partner at Benchmark Capital, says in Adam L. Peneberg’s book “Viral Loop,” but Friendster failed to scale and the cost was enormous.
So what should Internet startups avoid in order to grow? As former tech executives and consultants to hundreds of startups, we’ve seen how some companies scale and others fail, and we’ve assembled this knowledge in our recently released book “The Art of Scalability.” Take a look at our list of the top 10 scalability killers.
1. Thinking Scalability Is Just About Technology
This is really the reason we wrote our book. We started our firm as a consulting company focused on helping companies scale their technical platforms. Soon we realized that we were finding as many problems with organizations and processes as with technology. People ultimately are the ones who make mistakes in designing systems or overlook certain design elements that would allow a system to scale cost-effectively. Experience and culture are critically important in designing systems to scale.
2. Overuse of Synchronous Calls
This really shouldn’t come as a surprise to anyone familiar with scalable Internet architectures, but we still find an overabundance of synchronous calls within architectures. There are times when you need a synchronous call or when the development of an asynchronous solution will take too much time. However, it’s important that you build the right questions into your development processes to challenge synchronous implementations early.
3. Failure to Weed or Seed Soon Enough
We’ve written about how to hire, fire and mentor and why to remove underperformers quickly for superior teams. Our message is simply that you can never eliminate underperformers soon enough and that you should always be looking for superior talent. Superior people make excellent technology and develop appropriate processes.
4. Inappropriate Use of Databases
Databases are expensive and often monolithic nightmares of congestion that create single points of failure for architectures. Use them when you need to rely on the ACID properties of a database to resolve issues of consistency, isolation and durability during high transaction read and write conflicts. If you’re simply writing something once and reading it many times, as is often the case with pictures and PDF files, store them in less costly infrastructure alternatives.
5. Cesspools Instead of Swim Lanes
Network architectures have long had the notion of fault isolation through collision domains. Scalable Internet architectures should have fault isolation such that failures in certain components don’t impact other zones of functionality. We refer to these fault isolation zones as “swim lanes.”
6. Reliance on Vertical Scale
This raises its ugly head in many of our engagements, especially in early stage companies. You should almost always design for horizontal scale. Certainly there are times when you feel growth in any given area will be minimal and in fact that such small growth might be more cost-effectively served by a vertical scale. Such a financial decision can be sound and appropriate. But when you believe you’ll grow aggressively in any given area, you should design your architecture to allow you to be in control of your own destiny through horizontal scale.
7. Failure to Learn from History
Santayana’s Repetitive Consequences, “Those who cannot learn from history are doomed to repeat it,” is true of young technical organizations as well. In the engineering and operations world, an inability to look to the past and find the most commonly repeated mistakes is a failure to maximize shareholder value and grounds for dismissal. The best and easiest way to improve our future performance is to track our past failures, gather them into groups of causation, and treat the root cause rather than the symptoms. Perform post mortems of projects and site incidents and review them quarterly for themes.
8. Changing Development Methodologies to Fix Problems
CIOs and CTOs see repeated problems such as missing dates or dissatisfied customers and blame their product development life cycle (PDLC). Often they too quickly move to change the process without addressing root causes. A lack of involvement from the business tops the list of problems. In the Scrum model there needs to be consistent involvement from the business or product owner. Another common problem is an incomplete understanding or training on the existing methodology. Everyone in the organization should have a working knowledge of the entire process and their roles. Change the PDLC if there are valid reasons such as a better cultural fit, but don’t alter it before addressing the core issues.
9. Too Little Caching, Too Late
Caching is your friend. If you’re writing once and reading often, and if that data has a common usage pattern, you should make aggressive use of caching. Consider content delivery networks outside of your facilities and inside your network page, image, object and application caches and any other cache solution you can find!
10. Overreliance on Third Parties to Scale
Every vendor has a quick fix for your scale issues. If you’re a hyper-growth SaaS site, or hope to be, you don’t want to be locked into a vendor for your future business viability. You want to make sure that your site’s scalability is built into your architecture, not your technology. This isn’t to say that after you design your system to scale horizontally you won’t rely upon some technology to help you, such as the caching solutions discussed above. Once you define how you can horizontally scale your database and application, you may want to use any of a number of different commodity systems to meet your needs.
Marty Abbott and Michael Fisher are partners with AKF Partners.
Image courtesy of Flickr user Italianjob17.