9 Comments

Summary:

Two weeks ago, a post quoting Michael Stonebraker, who questioned the relevance of MySQL and Facebook’s use of it, sparked an overwhelmingly negative response. The true state of the database market appears to be that while SQL has its place, ideal uses are fading fast.

Two weeks ago, I wrote a post that sparked a pretty overwhelming response. The gist of the post, derived from an interview with database pioneer Michael Stonebraker, was that legacy SQL databases, including MySQL, are relics and no longer relevant with regard to today’s web applications. Stonebraker cited Facebook’s renowned MySQL-plus-memcached architecture as an example of how much effort it takes to make such databases keep up with applications that store lots of data and serve high rates of transactions.

Michael Stonebraker

By and large, the responses weren’t positive. Some singled out Stonebraker as out of touch or as just trying to sell a product. Some pointed to the popularity of MySQL as evidence of its continued relevance. Many challenged how Stonebraker dare question the wisdom of Facebook’s top-of-the-line database engineers.

They’re all fair-enough statements, but they also somewhat missed the point. Stonebraker wasn’t calling out Facebook, nor was he suggesting (as far as I can tell) that it abandon MySQL tomorrow. Yes, he has a product, VoltDB, to sell, but that shouldn’t blur the overall message: Whatever database technology someone might choose to use for a new web application, anyone who hopes to achieve even a fraction of Facebook’s traffic should not go down the same path as Facebook did.

Facebook’s implementation is a sign of the times in which it was built, but the evidence suggests that if Facebook could do it over again with today’s database options, it wouldn’t go down the same path. Sharding MySQL thousands of times, operating thousands of memcached servers and paying a team of crack engineers to keep it scaling is nobody’s idea of fun.

First, Facebook

Nobody denies that Facebook’s MySQL team is supremely smart or that it does a great job innovating to ensure that the database is able to keep up with the site’s transactions.

Jim Starkey

Jim Starkey, the founder and CTO of NimbusDB — and a man with some serious relational database and MySQL credentials – puts it well. “You either scale to where your customer base takes you or you die,” he said, and Facebook has been able to do with MySQL what would others would not have been able to do. It has “absolutely skilled” engineers, he added, but they don’t exist everywhere, and Facebook has the added benefit of being able to pay them.

Paul Mikesell, the founder and CEO of Clustrix, echoed that sentiment, telling me that Facebook has done great work to make its site scalable. Clustrix sells a “NewSQL” database that is compatible with MySQL. Interestingly, Jonathan Heiliger, the soon to be former VP of technical operations at Facebook, sits on Clustrix’s advisory board.

No, it’s not so much Facebook’s MySQL implementation that’s the problem. By and large, it does what it’s designed to do, which is to keep up with the myriad status updates and other data that populate users’ profiles. Rather, it’s that Facebook had to expend so much money and so many man-hours to get there.

Facebook has declined numerous requests for comments, save for this snippet from a spokesperson: “[Our] philosophy is to build infrastructure using the best tools available for the job and [we] are constantly evaluating better ways to do things when and where it matters.”

Indeed it is. As I noted in the original post, as Facebook has rolled out new applications, it has increasingly utilized newer database technologies better suited for those tasks. Inbox search within Facebook is powered by the Cassandra NoSQL database that it created, while Facebook Messages and some other new applications use HBase. It looks like Facebook is onto something.

Actually, MySQL isn’t the problem . . .

Curt Monash

According to database industry analyst Curt Monash, Stonebraker makes a valid point in citing Facebook’s complex MySQL situation, because Facebook isn’t using MySQL for its relational capabilities. MySQL might be a fine database choice for a low-end application that requires full relational capabilities, but sharded MySQL plus memcached is not. You lose a lot of those as soon as you begin sharding, he explained, and the application actually communicates directly with memcached for data that resides in that layer. It’s that architecture that’s the problem.

Monash believes there are two timelines for when a technology runs its course, depending on the situation: when you shouldn’t use it to start a new project, and when you should upgrade. For new projects that might have to scale massively, he said, you wouldn’t choose MySQL plus memcached.

As for the sharding, Starkey said, “The only thing sharding has going for it is the absence of alternatives.” He noted that although it’s difficult to find anything he and Stonebraker agree on, they do both agree that traditional SQL databases aren’t easy to scale. Because scaling them is so complex, Starkey — who, like Stonebraker, has a horse in the NewSQL race with NimbusDB — thinks all legacy databases will be irrelevant in a few years. All except low-end MySQL, that is.

Monash said there are several possible options for companies that want to retain MySQL features while still being able to scale, including Clustrix, TokuDB, ScaleDB and Schooner MySQL with Active Cluster. Clustrix’s Mikesell noted that several of its customers were very happy to be done sharding after they made the switch, while others saved lots of human and capital resources by never having to shard in the first place.

There also are startups, such as dbshards and ScaleBase, that make sharding transparent to applications, saving developers from having to write applications that can handle a sharded database.

… always

However, if you don’t need relational features and/or ACID compliance, Monash says there are many possibilities, of which VoltDB, NimbusDB and the other NewSQL databases might not even be the best options. Monash actually takes a pretty harsh stance when it comes to VoltDB.

Even Starkey acknowledges this, explaining that you only really need ACID if you have valuable data. Google has a relational database for its revenue-related information, he said, but uses NoSQL tools like BigTable elsewhere. If a company has plans for its web application to scale and start driving a lot of traffic, Starkey said, he can’t imagine why it would build that new application using MySQL.

But Facebook isn’t a greenfield environment, which makes matters more complicated. Given Facebook’s reliance on memcached and use of it as a key-value store, though, Monash said a Membase Server, a NoSQL database, might actually be a good replacement if Facebook were to transition from MySQL. That’s because Membase has memcached built in and is designed to mimic it in many ways, only in a single tier.

James Phillips, the co-founder and senior VP of products at Couchbase (the new corporate home for Membase Server), said the vast majority of Membase deployments are for new applications, but large sites switching to it from a MySQL-plus-memcached environment isn’t unheard of. In fact, Zynga recently made the switch.

Also, Netflix recently transitioned from an Oracle database to SimpleDB on Amazon Web Services and Cassandra. For a detailed explanation of how and why, check out this presentation by Sid Anand, its cloud data architect.

Based on what he knows of Facebook’s architecture, some of which likely was gleaned from Facebook Director of Engineering Robert Johnson, who sits on Couchbase’s advisory board, Phillips thinks it would be possible, although not necessarily easy, for Facebook to make a switch.

Furthermore, most NoSQL databases and a number of NewSQL databases have open-source and/or free versions, so developers concerned with cost or flexibility aren’t without options.

In closing

Monash sums it up nicely: “Are there undesirable aspects to the Facebook architecture? Absolutely. Are they as serious as [Stonebraker] makes them out to be? Absolutely not.”

That’s because it has the engineering talent to do what it pleases, whether that’s sticking with MySQL or eventually transitioning to something else. But not everyone has that luxury, and if they don’t really need a relational database, or really need a relational database that can scale, there’s a strong case to be made that MySQL is no longer the most desirable option.

Image courtesy of Flickr user mandiberg

  1. Derrick,

    Great article. However, I think this article should have mentioned that outside of the small realm of California start-ups and social media/gaming companies, the “real” world of IT uses SQL-based solutions. In fact, if you stack up database vendors by number of new installs (including 2011), Microsoft SQL Server leads the pack, followed by Oracle. Similarly, there are 100s of 1000s of DBAs that support SQL Server and Oracle compared to a handful of experts in new emerging technologies like Cassandra or VoltDB. Just search for “Oracle DBA” and “Cassandra DBA” on any job site to see this disparity (1000s of jobs vs a dozen postings). The new technologies are promising but they are not replacing good old Microsoft SQL Server or Oracle any time soon.

    Share
    1. Derrick Harris Thursday, July 21, 2011

      Dev,

      Totally agree. In fact, I think that case is presented in the post: it’s about what any given application needs. Depending on the circumstances, it could be MySQL, NewSQL, NoSQL, whatever. But if scale — either in size or load — is anticipated, there are options out there designed for that purpose.

      Share
    2. Caveat: I represent NimbusDB.

      The pervasiveness of SQL (technology, skills, tools, business process, etc) is exactly the point of NewSQL. The (SQL) rails are laid; the challenge has been to build the train that runs on the old rails but delivers radical new capabilities.

      That train now exists. It is called NewSQL. Certainly it will go through the usual technology adoption cycles, but Mike Stonebraker is quite right to sound a wake-up call. The NewSQL train is a-coming.

      Share
      1. Caveat: I represent ScaleBase

        Legacy technology has allot of things going for it, and as history taught us – it really is difficult to convince customers to move away from their relational database (even adopting a new programming language is hard, and adopting a new database is nearly impossible).
        So, same case in the OODB of the 90′s. Allot of new promising technologies, and at the end of the day – the database providers will improve, some eco system tools that assist existing databases will rise up, and 5 years from now most people will still use MySQL, SQL Server and Oracle for web or enterprise applications.

        Share
        1. Derrick Harris Friday, July 22, 2011

          Liran,

          I think you’re correct, especially in the enterprise, which is why tools like yours that help improve upon existing SQL databases are critical. But I would suggest that a shift toward new architectures has to start at some point, right?

          I’m reminded of a few years ago when many lambasted cloud computing, only to change their tunes once it established itself as the real deal. But, still, companies are prudent, opting to run standard types of applications in the cloud using MySQL, Oracle and other DBs, for example, and only experimenting with PaaS and newer ways of doing things for brand new, ad hoc applications. Like all things cloud, the new breed of database technologies will have to mature some before they garner mainstream attention.

          If the analogy is accurate, legacy databases will be around for a while and keep improving — and never totally vanish — but a shift might have started, if only for new apps with new requirements.

          Share
  2. Interesting article; thank you! First a disclaimer: I work for Oracle, doing MySQL support, including sometimes with Facebook. (And, of course, conversely, I only speak for myself, not for Oracle.) The one thing I don’t get in this article (and the ones before it) is in the following sentence: “paying a team of crack engineers to keep it scaling is nobody’s idea of fun”. Maybe not most people’s idea of fun… :) But these articles seem to imply that this is some great overhead. I’ve never known a critical database system that didn’t require DBAs and system admins. Even after years, I am constantly amazed at how very tiny the support team is for such a huge and popular application. Vast amounts of data, millions of concurrent users, and very little downtime. Yes, Facebook has hired from the best of the best, but that’s in line with it being the biggest of the big. They have grown massively, and still work with a tiny team to scale and keep things running smoothly. What system would not require support to grow and perform? Kudos to Facebook and its team (and to MySQL) for doing it with so few people. (I won’t mention numbers as it is not my place to do so.)

    Ben Krug

    Share
    1. Derrick Harris Friday, July 22, 2011

      Ben,

      That’s a very fair point. Perhaps a poor choice of words on my part. Maybe it should have read “is not within everyone’s capabilities.” Of course every critical database has a team supporting it; doing what FB is doing just requires that they be the best at what they do.

      Share
  3. Razi Sharir Sunday, July 24, 2011

    Caveat: I represent Xeround.

    Very good and balanced coverage Derrick and great commentary by the great respected gurus of DB land.
    I agree with Barry Morris (a respected competitor). In fact, when I talked to Mass Aslett a while back about Xeround, I told him that we combined the best of NoSQL under the hood with SQL on the top; called it NewSQL and urged him to go talk to Nimbus whom seemed to take a comparable NewSQL track.
    One size does not fit all – Our beta experience shows that SQL is here, live-n-kicking; the need for transactional relational DB is not going away. Having said that, there’s also a clear need for NoSQL mostly around DW/BI use cases.
    On top and beyond, it is not only about elastic scalability which we already have in live production, cloud is also about HA, distribution and multi-tenancy in a cloud agnostic GTM where the on-demand, pay-per-use significantly affect CAPEX/OPEX.

    Share
    1. Razi Sharir Sunday, July 24, 2011

      Typo correction; I apologize – it is Matt Aslett from the 451 group

      Share

Comments have been disabled for this post