18 Comments

Summary:

Even when relational databases are used to build large-scale services, they are unrecognizable as relational. Instead, they look almost exactly like a NoSQL database. This perspective is rare outside of the companies forced to embrace it, but we can use Twitter as an example.

whale

Scale breaks everything. Scale even breaks your assumptions about how best to store and query data. Scale does not care about your personal engineering preferences, or about SQL vs. NoSQL. The demands of rapid growth and ever-higher expectations for availability, performance, and cost efficiency force you to re-evaluate and re-imagine what you need, what is possible, and how to best achieve your business goals. This is the context in which non-relational databases like Dynamo, BigTable, Memcache, and Membase were conceived and built. However, even when relational databases are used to build large-scale services, they are unrecognizable as relational. Instead, they look almost exactly like a NoSQL database.

This perspective is rare outside of the companies forced to embrace it. The explosion of open-source databases, including those from major online services, presents a great opportunity to see how things look when engineers are faced with the demands of enormous scale. Let’s embrace that opportunity, as those companies have, by examining a production data storage service and see what SQL really looks like at scale.

Dissecting the Bird

Twitter began as a monolithic relational database accessed by a monolithic Rails application. Facebook began as a monolithic relational database accessed by a monolithic PHP application. Amazon began as a monolithic relational database accessed by a monolithic C++ application. From these humble beginnings, through many painful lessons in growth, all have developed the tools they need to thrive at enormous scale. The tools encode those lessons, so examining them can be instructive.

We’ll take as our example a system that is both used in production at large scale and is open source: FlockDB, the storage service that maintains the Twitter social graph. While billed as a graph database, FlockDB is better described as a set database: it stores sets of adjacencies and supports a small number of operations over those sets. FlockDB is typical of storage systems at large, online services; the interface is extremely narrow, and clients are very loosely coupled to the service. This structure is common, because broad, complex interfaces and synchronous dependencies, like transactions, are hard to scale. If you’ve ever seen lock pile-ups in a relational database, you have some idea of this already. At scale, such things are lethal.

FlockDB solves a specific problem for Twitter: storing and retrieving the billions of direct connections between their users, with low latency, high availability, and simple, horizontal scaling. The FlockDB interface amounts to two calls: Execute and Select. Execute adds and removes connections, or edges, between users. Select retrieves and manipulates sets of edges: intersection, union, and difference to identify followers in common, followers not in common, etc. Further, edge management operations are asynchronous and may not execute until long after success is returned to a client, so this critical service can continue operating during failures. There’s no support for general, multi-hop graph walking or analysis (hence my calling it a set database), nor should there be. Such things aren’t required to deliver Twitter services.

The underlying storage system is a simple, database sharding layer called Gizzard. Shard implementations expose a simpler, narrow interface similar to that other the entire service. These shards are independent of each other. Operations that cross shard boundaries retrieve data from each required shard and performing the various set operations in FlockDB. The SQLShard implementation, used for Twitter’s production deployment on MySQL, has just two tables (edges and metadata) and a single index. Stripped to essentials like this, a relational database and a NoSQL database are hard to tell apart.

One last aspect of FlockDB will surprise most RDBMS users. Should a write operation fail because a shard is currently inaccessible, Gizzard does not return failure to the client. Instead, the write operation is pushed to a Kestrel queue to be re-driven later when the shard is available. This trade-off prioritizes availability over consistency, keeping the service up even during partial failures of the underlying components.

This is SQL at scale: radically simple schema, extremely narrow interface, asynchronous writes, and application-layer management of data distribution and query aggregation. These are also the properties of many non-relational databases. At this scale, most of the advantages of a relational database — ACID semantics and complex, ad-hoc queries — are traded for other advantages: operational simplicity, linear performance scaling, geographic distribution, and extreme fault tolerance.

Joining the Flock

I have a confession: I don’t like NoSQL. More than that, I don’t like the concept of NoSQL as somehow the antithesis of SQL. Having spent far too much time and energy debating endlessly on that point, I’ve realized I was wrong: the source of the conflict isn’t a clash of database paradigms; it’s a clash of contexts. In most environments, the choice to use a relational store like MySQL or a document database like Riak or a column-oriented database like Cassandra is one of preference, legacy infrastructure, tooling and personnel. Without the same pressures and concerns that created the non-relational alternatives, an endless debate is inevitable, because scale is absent.

That’s the real lesson here. Lost in all the debates about SQL vs. NoSQL, ACID vs. BASE, CAP, and all the rest is simply this: focus on the process to build a great company and great products. That’s what every successful technology company has done in order to reach the scale where things like BigTable and FlockDB are required. They didn’t become successful because they built these systems. They built these systems because they became successful. If your success drives you to the scale where relational databases are no longer efficient, don’t be afraid to look beyond them. That might mean adopting one of the existing NoSQL systems, or even building your own to meet your exact needs. This is what happens at scale. Embrace it.

Code and information for the mentioned projects is available at Github:

Benjamin Black is the co-founder of fast_ip. Previously he built large-scale infrastructure for Internap, Amazon, and Microsoft. His Twitter handle is @b6n.

Related content from GigaOM Pro (sub req’d):

  1. Brian Bulkowski Saturday, November 6, 2010

    We at Citrusleaf agree completely about building great products, and a great company, instead of focusing on the buzzwords regarding NoSQL. We believe in highly available lateral scale. Check us out.

    1. FWIW, your comment comes across as “I’m going to ignore my lack of content just to reply and work my company name in.”

      1. Exactly my thoughts as I read through his comment: “We at [my company] agree, check us out.” :|

  2. Excellent article!

    As someone who has had to scale several projects in the past to face severe load requirements, I can definitely relate to the there’s no “one size fits all” solution for db solutions, as this article points out.

    Would-be entrepreneurs, or newbie start-ups, focus so heavily on the technology aspect, and neglect.. or completely FORGET everything else!

    Bottom-line, business is about providing value. Always keep that in mind when making a decisions from a programming, or managerial perspective, and you won’t go wrong.

    If you’re a programmer for example.. don’t think about use-cases like a programmer, evaluate them from the perspective of the end user. Try to remember, they could generally care less about how it works. As long as it gets the job done, quickly, they’re happy.

    Once again, great article. Enjoyed it!

  3. Amen brother, this is the most lucid explanation of the “why” NoSQL exists. People seem to forget it all the time.

  4. Ragu Kattinakere Saturday, November 6, 2010

    Thanks Block.

  5. Or, you could just use Clustrix.

    1. Benjamin Black sam Sunday, November 7, 2010

      Clustrix is certainly an interesting product. For large-scale sites it has a number of drawbacks, however. First, geographic distribution for a system based on extremely low latency InfiniBand connectivity is pretty tough. Second, Clustrix requires you use their hardware, their OS, etc. If you are trying to fit that into an infrastructure based on thousands (or more) of standard, white box servers, all running the same OS and managed in the same way, then appliances can be a real source of trouble. Third, there is probably a reason Clustrix doesn’t show what happens to their performance as large numbers of nodes are added. The graphs they have published show growth going sublinear on the number of nodes past about 16 (and they stop at 20). Finally, those same graphs show performance with up to 180 million rows. That is 2 orders of magnitude below what Twitter handles right now.

      Fantastic stuff, just not a great fit for this sort of environment.

      1. Clustrix is designed to be a single instance database in a single data center. Geographic distribution is done through replication, if desired. The graph we present is linear up to 20 nodes. We didn’t go any bigger because we didn’t have more hardware to dedicate to the task at the time. The size of the table there was to illustrate increasing working set size, not necessarily total data size. We have customers with billions of rows in a single table today.

        Our philosophy at Clustrix is all about giving the application developers the maximum amount of flexibility, fault tolerance, and ease of use. We offer full relational semantics but you can still implement schemas that are simple key/value pairs if that’s appropriate for the app. We offer full transactional support for atomic updates but you can still modify items one at a time if you want. Writes are guaranteed durable and consistent so apps don’t have to worry about the artifacts that show up with eventual consistency. We build appliances so we can guarantee performance, reliability, and durability without requiring the customer to configure or tune anything. I have a few blog entries on this sort of stuff at: http://www.clustrix.com/resources/blog/ that goes into some of these things in more detail.

        Our goal is to be a good fit for exactly this sort of environment and over time we’ll be able to prove that in the market.

        Aaron Passey
        CTO
        Clustrix

  6. Top Posts — WordPress.com Sunday, November 7, 2010

    [...] NoSQL Is for the Birds Scale breaks everything. Scale even breaks your assumptions about how best to store and query data. Scale does not care [...] [...]

  7. You want to know what I am tired of?

    People who continuously confuse the Relational Model and SQL, both logical/mathematical concepts, with some sort of physical implementation and problems thereof, like the above Twitter DB explanation.

    Yes, traditional DBMSes have legacy engines that are poorly-suited to the modern world of the few sites that manage billions of users, but that doesn’t mean you throw away the baby with the bathwater…

  8. I think it’s about preference and anticipated scale. SQL is so ubiquitous that using it for something like an Android app or desktop program is trivial; it’s there, it’s easy to use, and it’s going to work. Not only that, SQL is old so it has the benefit of reliability (meaning, few if any bugs.)

    On the other hand, while I’ve been extremely apprehensive of NoSQL for the past year (multiplied by Digg’s issues with Cassandra), I’ve been really sipping on the CouchDB kool-aid. A schemaless document-store which uses JSON and communicated over HTTP is just plain awesome, and throws LAMP and its derivatives on their heads. Additionally, you can even attach files to the database to make a “CouchApp” – an application/website served directly from the database. Basically, you only have to know HTML and Javascript for it to work; no server side language like Rails or PHP, and no query language like MySQL. The more I learn, the more it seems like the future of the web. Doesn’t hurt CouchDB is immensely replicable.

    1. “multiplied by Digg’s issues with Cassandra”

      This is an unfortunate bit of FUD. Here are 2 Digg engineers disputing that claim: http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures . I hope people will learn to take articles (even mine!) from various industry sites with a grain of salt. A bit of investigation on your own is worth the effort.

      My personal experience with Cassandra is that it takes careful planning and operational discipline to use it at scale. What database requires otherwise, though?

      “A schemaless document-store which uses JSON and communicated over HTTP is just plain awesome”

      I agree completely. For a natively distributed alternative to CouchDB, have a look at Riak: http://wiki.basho.com/display/RIAK/Riak

      1. “This is an unfortunate bit of FUD. Here are 2 Digg engineers disputing that claim: http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures . I hope people will learn to take articles (even mine!) from various industry sites with a grain of salt. A bit of investigation on your own is worth the effort.”

        Of course. But I’m not exactly in the position of choosing database systems for a large company with a huge amount of data, my interest only peaks into the server realm when I get the itch. So my knowledge about any other NoSQL dbase aside from CouchDB is limited. (I only looked at CouchDB because it was recommended by a friend, and I was so enthralled by what I found I jumped in head first.)

        Thanks for that link. I had seen that Digg downplayed Cassandra’s roles in their troubles, which I of course expected. Still it’s nice to see an in-depth technical reasoning behind their rebuking.

        “My personal experience with Cassandra is that it takes careful planning and operational discipline to use it at scale. What database requires otherwise, though?”

        Naturally. But at least in comparison to RDBMS’s I find Couch much much more “relaxing,” which I guess is the whole point. :P

        “I agree completely. For a natively distributed alternative to CouchDB, have a look at Riak: http://wiki.basho.com/display/RIAK/Riak

        I will, thanks. But for the moment I’m way too hooked on the idea of CouchApps.

    2. ““CouchApp” – an application/website served directly from the database. Basically, you only have to know HTML and Javascript for it to work”

      Anything with an HTTP interface can do this, including Riak. For example, Sean Cribbs created this chat app as a demo for a presentation on Riak I gave at Velocity: https://github.com/seancribbs/yakriak .

  9. Great article – nosql, or sql, or some other data store – thinking about the problem you are trying to solve and what your customers want is the heart of good engineering.

  10. I don’t like SQL, because I’m fed up with Oracle, at work. I wish we’d used a file based storage instead. Then we’d have had versioning for free, by simply placing the files in e.g. a Git repository. (Our database is fairly small; it’d worked all right with files.)

  11. 8 Cloud Companies to Watch in 2011: Cloud Computing News « Wednesday, January 5, 2011

    [...] NoSQL Is for the Birds [...]

Comments have been disabled for this post