Scale breaks everything. Scale even breaks your assumptions about how best to store and query data. Scale does not care about your personal engineering preferences, or about SQL vs. NoSQL. The demands of rapid growth and ever-higher expectations for availability, performance, and cost efficiency force you to re-evaluate and re-imagine what you need, what is possible, and how to best achieve your business goals. This is the context in which non-relational databases like Dynamo, BigTable, Memcache, and Membase were conceived and built. However, even when relational databases are used to build large-scale services, they are unrecognizable as relational. Instead, they look almost exactly like a NoSQL database.
This perspective is rare outside of the companies forced to embrace it. The explosion of open-source databases, including those from major online services, presents a great opportunity to see how things look when engineers are faced with the demands of enormous scale. Let’s embrace that opportunity, as those companies have, by examining a production data storage service and see what SQL really looks like at scale.
Dissecting the Bird
Twitter began as a monolithic relational database accessed by a monolithic Rails application. Facebook began as a monolithic relational database accessed by a monolithic PHP application. Amazon began as a monolithic relational database accessed by a monolithic C++ application. From these humble beginnings, through many painful lessons in growth, all have developed the tools they need to thrive at enormous scale. The tools encode those lessons, so examining them can be instructive.
We’ll take as our example a system that is both used in production at large scale and is open source: FlockDB, the storage service that maintains the Twitter social graph. While billed as a graph database, FlockDB is better described as a set database: it stores sets of adjacencies and supports a small number of operations over those sets. FlockDB is typical of storage systems at large, online services; the interface is extremely narrow, and clients are very loosely coupled to the service. This structure is common, because broad, complex interfaces and synchronous dependencies, like transactions, are hard to scale. If you’ve ever seen lock pile-ups in a relational database, you have some idea of this already. At scale, such things are lethal.
FlockDB solves a specific problem for Twitter: storing and retrieving the billions of direct connections between their users, with low latency, high availability, and simple, horizontal scaling. The FlockDB interface amounts to two calls: Execute and Select. Execute adds and removes connections, or edges, between users. Select retrieves and manipulates sets of edges: intersection, union, and difference to identify followers in common, followers not in common, etc. Further, edge management operations are asynchronous and may not execute until long after success is returned to a client, so this critical service can continue operating during failures. There’s no support for general, multi-hop graph walking or analysis (hence my calling it a set database), nor should there be. Such things aren’t required to deliver Twitter services.
The underlying storage system is a simple, database sharding layer called Gizzard. Shard implementations expose a simpler, narrow interface similar to that other the entire service. These shards are independent of each other. Operations that cross shard boundaries retrieve data from each required shard and performing the various set operations in FlockDB. The SQLShard implementation, used for Twitter’s production deployment on MySQL, has just two tables (edges and metadata) and a single index. Stripped to essentials like this, a relational database and a NoSQL database are hard to tell apart.
One last aspect of FlockDB will surprise most RDBMS users. Should a write operation fail because a shard is currently inaccessible, Gizzard does not return failure to the client. Instead, the write operation is pushed to a Kestrel queue to be re-driven later when the shard is available. This trade-off prioritizes availability over consistency, keeping the service up even during partial failures of the underlying components.
This is SQL at scale: radically simple schema, extremely narrow interface, asynchronous writes, and application-layer management of data distribution and query aggregation. These are also the properties of many non-relational databases. At this scale, most of the advantages of a relational database — ACID semantics and complex, ad-hoc queries — are traded for other advantages: operational simplicity, linear performance scaling, geographic distribution, and extreme fault tolerance.
Joining the Flock
I have a confession: I don’t like NoSQL. More than that, I don’t like the concept of NoSQL as somehow the antithesis of SQL. Having spent far too much time and energy debating endlessly on that point, I’ve realized I was wrong: the source of the conflict isn’t a clash of database paradigms; it’s a clash of contexts. In most environments, the choice to use a relational store like MySQL or a document database like Riak or a column-oriented database like Cassandra is one of preference, legacy infrastructure, tooling and personnel. Without the same pressures and concerns that created the non-relational alternatives, an endless debate is inevitable, because scale is absent.
That’s the real lesson here. Lost in all the debates about SQL vs. NoSQL, ACID vs. BASE, CAP, and all the rest is simply this: focus on the process to build a great company and great products. That’s what every successful technology company has done in order to reach the scale where things like BigTable and FlockDB are required. They didn’t become successful because they built these systems. They built these systems because they became successful. If your success drives you to the scale where relational databases are no longer efficient, don’t be afraid to look beyond them. That might mean adopting one of the existing NoSQL systems, or even building your own to meet your exact needs. This is what happens at scale. Embrace it.
Code and information for the mentioned projects is available at Github:
Benjamin Black is the co-founder of fast_ip. Previously he built large-scale infrastructure for Internap, Amazon, and Microsoft. His Twitter handle is @b6n.
Related content from GigaOM Pro (sub req’d):