I’ve been thinking about distributed systemsa lot lately. While the last decent code I wrote was in Logo, it suddenly dawned on me that I deal with a particular type of distributed system every day: start-ups. Let’s examine a few of the similarities.
Modern web-scale applications like Google, Twitter, Netflix, LinkedIn, etc. are implemented as distributed systems as opposed to single monolithic codebases. This means that they are composed of tens or hundreds of services that communicate asynchronously with each other, ultimately delivering a response to the end-user.
Google.com is composed of more than 200 services, according to a talk by Jeff Dean in 2010. Some of these services are external facing, spell checker, instant search, etc., while others are only for internal consumption and not visible to users. Not to get too meta, but most of these individual services are in fact distributed systems themselves.
Philosophical and physical distribution rules at startups
One of the reasons the service-oriented approach has become dominant is that each service can be developed, managed, and scaled independently. It also allows teams to iterate and update individual services at their own pace.
Here, there is a direct parallel to the functional structure of start-ups and large companies. Engineering, marketing, sales, and finance are their own disciplines, have their own teams, and move with their own cadence. Actual organizational power, decision-making and relationships are complex and seldom map to the org chart. Similarly, what a web-scale application really looks like is a messy set of interconnections, seldom mapping to the 3-tier architecture diagram.
Start-ups are also becoming more physically distributed. Engineering in Israel, sales in the U.S., or the rock star developer who lives in Boulder/Tahoe/Iowa because he/she can. This is so common as to barely merit mention. Overcoming physically distributed teams is easier than it has ever has been assuming a commitment to communications.
Skype, Chatter, Jive, IRC, IM, etc. can all enable distributed companies to scale. Over communicate but also know the limits over technology and be realistic about what needs to be done face-to-face. A great read on this topic from Basho who are experts in distributed systems and whose company is a distributed system can be found here.
Everything’s federated: From social buttons to SaaS
Most web pages now contain content that comes from a variety of third-party entities. Examples of this behavior include content delivery networks, performance monitoring tools, and ad networks. Start-ups are very similar here, as well.
There are a variety of third party SaaS services required to run their businesses, ranging from Github to Salesforce.com to Marketo. Failures in any of these internal or external business services can impact the business in a similar way that a third-party party ad network can slow down your page load times.
Scaling: It’s hard for sys admins and for startups
Large-scale distributed system like Twitter, Google or Facebook, have evolved their software architectures significantly to deal with the challenges of scaling. The path from hosting a site in your bedroom, to Amazon Web Services, to a col-location space, to a data center you own requires bringing in new skills that were previously not relevant to help along the way.
Mirroring that, the organizational structure of a start-up will change significantly as it grows. From the early days of product development, through learning how to sell and adding a customer-facing team to replicate early wins, each stage will require organizational changes and evolution. In both cases, changes need to be made on the fly and require careful planning and execution. Even then this can be painful; the often-used “change the engine while the airplane is in flight” analogy.
Herding cats might be more predictable
Both start-ups and distributed systems can have unique, unpredictable, even Byzantine failure modes. In each scenario because of the complexity of interactions, there are bound to be situations that can’t be tested or even anticipated.
This is further complicated by a complex set of dependencies on actors you don’t directly control such as the third-party services mentioned above. Even though services are developed independently, connecting them all together often results in failures that have a way of cascading. This is easy to see in the service interruption of a highly complex distributes system like Amazon’s EBS service a few years back.
In start-ups, the “failures” or “outages” can be things like a product release slipping, a disgruntled employee negatively impacting culture, sales ramping slower than expected, or fund raising taking longer than expected. Many of these can be recovered from, some cannot. I’ll cover start-up failure modes, disaster porn, and how to avoid them in a subsequent article.
Scaling distributed systems and organizations is hard. My dime store organizational behavior observations aside, I’d love to hear about other similarities and difference that come to mind.
Alex Benik is a principal at Battery Ventures who invests in enterprise and web infrastructure start-ups. You can find him on Twitter at @abenik