177 Comments

Summary:

According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to “a fate worse than death.” It’s actually a predicament all too common among web startups, for which the solution might be a class of databases referred to as NewSQL.

Mysql

According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to “a fate worse than death,” and the only way out is “bite the bullet and rewrite everything.”

Not that it’s necessarily Facebook’s fault, though. Stonebraker says the social network’s predicament is all too common among web startups that start small and grow to epic proportions.

During an interview this week, Stonebraker explained to me that Facebook has split its MySQL database into 4,000 shards in order to handle the site’s massive data volume, and is running 9,000 instances of memcached in order to keep up with the number of transactions the database must serve. I’m checking with Facebook to verify the accuracy of those numbers, but Facebook’s history with MySQL is no mystery.

The oft-quoted statistic from 2008 is that the site had 1,800 servers dedicated to MySQL and 805 servers dedicated to memcached, although multiple MySQL shards and memcached instances can run on a single server. Facebook even maintains a MySQL at Facebook page dedicated to updating readers on the progress of its extensive work to make the database scale along with the site.

The widely accepted problem with MySQL is that it wasn’t built for webscale applications or those that must handle excessive transaction volumes. Stonebraker said the problem with MySQL and other SQL databases is that they consume too many resources for overhead tasks (e.g., maintaining ACID compliance and handling multithreading) and relatively few on actually finding and serving data. This might be fine for a small application with a small data set, but it quickly becomes too much to handle as data and transaction volumes grow.

This is a problem for a company like Facebook because it has so much user data, and because every user clicking “Like,” updating his status, joining a new group or otherwise interacting with the site constitutes a transaction its MySQL database has to process. Every second a user has to wait while a Facebook service calls the database is time that user might spend wondering if it’s worth the wait.

Not just a Facebook problem

In Stonebraker’s opinion, “old SQL (as he calls it) is good for nothing” and needs to be “sent to the home for retired software.” After all, he explained, SQL was created decades ago before the web, mobile devices and sensors forever changed how and how often databases are accessed.

But products such as MySQL are also open-source and free, and SQL skills aren’t hard to come by. This means, Stonebraker says, that when web startups decide they need to build a product in a hurry, MySQL is natural choice. But then they hit that hockey-stick-like growth rate like Facebook did, and they don’t really have the time to re-engineer the service from the database up. Instead, he said, they end up applying Band-Aid fixes that solve problems as they occur, but that never really fix the underlying problem of an inadequate data-management strategy.

There have been various attempts to overcome SQL’s performance and scalability problems, including the buzzworthy NoSQL movement that burst onto the scene a couple of years ago. However, it was quickly discovered that while NoSQL might be faster and scale better, it did so at the expense of ACID consistency. As I explained in a post earlier this year about Citrusleaf, a NoSQL provider claiming to maintain ACID properties:

ACID is an acronym for “Atomicity, Consistency, Isolation, Durability” — a relatively complicated way of saying transactions are performed reliably and accurately, which can be very important in situations like e-commerce, where every transaction relies on the accuracy of the data set.

Stonebraker thinks sacrificing ACID is a “terrible idea,” and, he noted, NoSQL databases end up only being marginally faster because they require writing certain consistency and other functions into the application’s business logic.

Stonebraker added, though, that NoSQL is a fine option for storing and serving unstructured or semi-structured data such as documents, which aren’t really suitable for relational databases. Facebook, for example, created Cassandra for certain tasks and also uses the Hadoop-based HBase heavily, but it’s still a MySQL shop for much of its core needs.

Is ‘NewSQL’ the cure?

But Stonebraker — an entrepreneur as much as a computer scientist — has an answer for the shortcoming of both “old SQL” and NoSQL. It’s called NewSQL (a term coined by 451 Group analyst Matthew Aslett) or scalable SQL, as I’ve referred to it in the past. Pushed by companies such as Xeround, Clustrix, NimbusDB, GenieDB and Stonebraker’s own VoltDB, NewSQL products maintain ACID properties while eliminating most of the other functions that slow legacy SQL performance. VoltDB, an online-transaction processing (OLTP) database, utilizes a number of methods to improve speed, including by running entirely in-memory instead of on disk.

It would be easy to accuse Stonebraker of tooting his own horn, but NewSQL vendors have been garnering lots of attention, investment and customers over the past year. There’s no guarantee they’re the solution for Facebook’s MySQL woes — the complexity of Facebook’s architecture and the company’s penchant for open source being among the reasons — but perhaps NewSQL will help the next generation of web startups avoid falling into the pitfalls of their predecessors. Until, that is, it, too, becomes a relic of the Web 3.0 era.

Feature image courtesy of Flickr user jimw; error image courtesy of Flickr user rubenerd.

  1. It is tough to accept Stonebraker as much more than a successful troll at this point. He makes outlandishly unsubstantiated claims about systems that he has no stated inside knowledge on, all to garner attention for his own products.

    If he actually could point to some failure on the part of Facebook (performance, keeping up with growth, etc), perhaps there would be some footing for his observations, but as is they seem baseless and a bit ridiculous.

    Share
    1. Facebook’s implementation of MySql is “a fate worse than death”? OldSQL is “good for nothing”? How does this clown have any credibility at all with ridiculous statements like that? Is it just me, or does Facebook seem to be doing OK? Sure, they may have to rework some aspects of their current implementation, but in the mean time, they’re freaking taking over the world. Somewhat less than a fate worse than death, I’d say.

      Share
      1. http://en.wikipedia.org/wiki/Michael_Stonebraker

        Yea – he really sounds like the kind of type that would need to do that. Its just a product. Its no becoming to freak out when your pet product gets ripped. Even MySQL’s founder opening acknowledges its shortcomings.

        Share
      2. @Scott -

        Stonebraker is a well respected legend, and has made incredible contributions to the database field. However over the past couple of years his attention-seeking rhetoric has simply gotten out of hand. And it only seems to be getting worse.

        He wants some attention for his various contenders-to-the-throne, and it seems like only absurd hysterics will get it now. No thanks.

        Share
    2. It’s “glory days” syndrome (cue Springsteen song) and it seems particularly acute among database pioneers like Stonebraker and Starkey. These are people who really did make great contributions Back In The Day, they remember what it felt like, and when they see those contributions becoming less relevant they become absolutely *desperate* to recapture that feeling. Some of them still have the chops to do it with technology, or at least hire others to do it for them. Failing that, they resort to shameless pimping of their latest brain-farts along with bashing everything that might keep it irrelevant. Sad, really.

      Share
      1. Tejaswi Nadahalli Saturday, July 9, 2011

        :-(

        Really sad.

        Share
  2. Nothing more than an advertisement disguised as an article.

    Share
    1. I agree its just an ad in disguise. And the more we comment the better the advert runs, quite ingenius! Well done.

      Share
  3. So Facebook should just throw everything in the bit and start again. Just like the Bolsheviks in 1917.

    Maybe not… I hear Fusion-io aren’t doing that bad as a result.

    ;)

    Share
  4. What Dennis said. Why is it that when a fringe case comes up, and sites like Facebook are fringe cases in terms of software development, it negates the use of things people have been using for decades? I’ve yet to work on a project that was too big for SQL, and I probably never will. Most of us never will. And I’ve worked on some pretty big stuff (like parts of MSDN on a “little” site like microsoft.com).

    One of the problems with self-labeled computer science people is that they tend to want to solve all kinds of problems before they’re problems. That’s a waste of time, especially for the business signing your checks. That the “noSQL” movement is often referred to as a movement should tell you something. The objective of business is not usually to start a religion or social upheaval, it’s to make money by solving problems.

    Share
    1. While I understand what you are saying, in the datawarehousing world, it is not uncommon to come across data sets that are too large for relational databases like oracle and sql server to handle. I know of several current examples where sql server and even oracle were unable to scale up to handle the load. That’s why companies like terradata exist (still relational, but different semantics). Of course, that also makes the case against the in memory part :).

      Share
    2. Jeff Putz, you say:

      “One of the problems with self-labeled computer science people…”

      I have to point out that Stonebraker isn’t “a self-styled computer scientist, he actually *is* a computer scientist. According to wikipedia, and some other sources, Stonebraker:

      …has a PhD in Computer Information and Control Engineering from the University of Michigan.

      …received the IEEE John von Neumann Medal and the first SIGMOD Edgar F. Codd Innovations Award; was inducted as a Fellow of the Association for Computing Machinery; and was elected a member of the National Academy of Engineering.

      …was a Professor of Computer Science at University of California, Berkeley, for twenty-nine years. He is currently an adjunct professor at MIT.

      Of course, none of that means that he’s right in this case, or means that the whole thing isn’t self-serving. All his accomplishments make it all the more disappointing if he is just trying to get attention for his own products. He’s made real contributions to the field. With his background, why not either keep contributing, teach (which is also a real contribution), or just rest on your laurels?

      Share
      1. Perhaps, contribution is not fiscally rewarding and many academics come to realize this eventually. Why should all the morons get the rewards? Are you saying that someone with a PhD and an academic should not also be able to get financially rewarded? I don’t see why an academic doesn’t deserve a share of the fiscal pie while CEO’s steal all the money based off other people’s creations. I say screw the CEO’s and go academics.

        Welcome to the world created by greedy business that likes to steal everything and pretend to have created it. Stop blaming academia and start blaming gold-digging moron CEOs.

        Share
  5. This guy clearly has no idea how Facebook uses MySQL. Watch any of the presentations that Facebook has given at QCon and you’ll quickly see they are far more sophisticated than Stonebraker gives them credit for.

    More information: http://www.infoq.com/facebook/

    Share
    1. who cares how exactly Facebook is using MySQL. His point is valid one. We’ve been using stonge age technology to solve problems that didn’t exist 30 years ago. A new approach is long overdue, and NoSQL or Hadoop ain’t it either.

      Share
      1. After NoSQL, there’s Opa:
        http://opalang.org

        Share
      2. I hear Facebook is still using transistors! Those idiots should upgrade from that ancient tech to light-based computing.

        Share
  6. Shards:
    What if the combined size of MySQL databases in Facebook was 100TB?
    Would you run 1 Oracle database on that data?

    Number of systems:
    Google utilizes 4000 servers for Google map. So what? They are not running a small mom ‘n pop shop.

    I believe FB has 4 DBAs, I know 1000 shops that have more Oracle DBAs. What if FB DBAs hate managing/administering appliances (Clustrix) and Java (VoltDB)? Xeround is a cloud database, why does it even get mentioned in this context is fascinating.

    At FB’s scale, extreme familiarity with software systems that are critical to business is required. If there is a bug or a workaround, their personnel need to find and fix it ASAP. Or is VoltDB without bugs?

    VoltDB has nothing unique, there was a failed start-up “ANTs software” that got digested into 4Js and Sybase, which boasted of a non-blocking database engine 10 years ago.

    So what is new: Many people claim that Ingres was better than Oracle in the 80s and early 90s, but who is the market leader today?

    All in all it sounds like someone (MySQL hater) is looking for attention.

    Share
  7. I recommend that you use GeneXus for the migration.

    Share
  8. I agree with all three previous comments. In addition, I found the following line in the article ridiculous:

    “The widely accepted problem with MySQL is that it wasn’t built for webscale applications or those that must handle excessive transaction volumes.”

    Huh? If it was widely accepted that MySQL wasn’t built for web applications, then why do so many startups keep using it? They couldn’t all be stupid, right? In addition to Facebook, Craigslist, Wikipedia, and TicketMaster use MySQL. And those are not low traffic, nor low transaction sites. YouTube (pre-Google) also used MySQL (and maybe they still do).

    By the way, I noticed GigaOm.com uses WordPress. Guess what database WordPress uses? MySQL. So either (1) GigaOm is a using a crap database or (2) MySQL is probably a pretty good solution for the web. It’s the latter.

    Share
    1. It would be difficult for me to imagine a database so huge that it’s truly beyond SQL-type databases. Walmart and the federal government have enormous databases and often use SQL based relational databases; I’ve thrown huge data pulls at Oracle, MySql, Sybase, etc, and if the queries are well programmed and the dba’s have it set up well there’s usually no long waits, bottlenecks, or crashes.

      Share
  9. >> The widely accepted problem with MySQL is that it wasn’t built for
    >> webscale applications or those that must handle excessive transaction volumes.

    Really? I guess Facebook, Craiglist, and Wikipedia aren’t webscale applications and must have low traffic. YouTube (pre-Google) also used MySQL (they still might). Lastly, I noticed GigaOm.com uses WordPress. And guess what database WordPress uses? MySQL.

    Share
    1. Facebook has a massive layer of Memcached servers that are used as an in-memory database, since the MySQL servers can’t take the read load of the live site. Facebook basically lives in memory chips. Also, for the inbox, the part that takes the bigger load, it uses HBase/Hadoop. The database is just a “safety vault”.

      Craiglist migrated to MongoDB, a NoSQL solution, a few months ago. (The same thing happened with Foursquare btw)

      Youtube (and the whole Google websites, with exception) work on top of Google Bigtable, something similar to a NoSQL database that’s built on top of GFS.

      Wikipedia is another heavy user of Memcached, since too many concurrent reads and writes would simply block the database. The database is just a “safety vault”.

      Twitter, another heavy MySQL user, uses a homegrown sharding server, but most reads happen on memcached. HBase and Hadoop are used for the search, just like Facebook. Again, the database is just a “safety vault”.

      From your examples, one could say that it’s next to impossible to run a site having only MySQL as a backend.

      Share
      1. I don’t think people using a cache negates the argument that mysql scales. Obviously a cache will be used in addition to the db.

        Share
    2. I think many of you are right about him trying to sell something, but the same people who are right are so, so wrong in other ways. He’s dropping the Facebook bomb because it’s something everyone can relate to. The whole point of what he’s trying to say is that IT IS TOO LATE FOR FACEBOOK TO SWITCH. MySQL works for them because they have plenty of resources to make it so.

      The exact thing he’s describing happened to my company. A lot of enterprise software nowadays isn’t provided as a web service by the software vendor–you buy the product and install it in-house on your own server’s instance of the DBMS it was written for. What, you’re going to give a start-up all of your data to handle on their servers? Yeah, OK. Anyway, we chose MySQL.

      As a start-up, you expect that your software might have a thousand end-users, maybe two tops, and MySQL is a perfect choice. You can’t just sell your product for crazy amounts out of the gate, so you want a DBMS that is easy to install and manage on the customer side, and more importantly, easy to code for on the dev side. MySQL is extremely well-documented; it’s not at all a black box. Being community-supported to a large extent is ideal for start-ups.

      When you distribute your software, and it actually catches-on, soon enough larger business want to use it. Now you have to support 20-100k end-users per installation, out of the box, in order to work your way into larger accounts and sales.

      If you started with MySQL, you’re kinda screwed until you rewrite a lot of code. DBMS isn’t easy to just switch when you have hundreds of customers each with their own installation. It’s practically impossible. We had to do our best with what we had and optimize the crap out of it, like Facebook is doing, but obviously to a much lesser extent. And once you rewrite, you have to keep supporting the legacy systems, or it pisses people off. It costs money to continue maintaining old code and support teams to keep existing installations limping along. If you don’t, then you instantly alienate the customers that created extra demand in the first place, as well as the ones who don’t necessarily need the extra scalability. It just costs a lot of money to start over with a new approach.

      The bottom line is that he’s completely right in at least one aspect (the most important one for this topic): if you aren’t scalable to start, you have to basically make a next-generation version of your product from scratch which actually uses a highly-scalable DBMS. The problem is that there isn’t as much demand for that, so there’s more risk to investing in it. This is a big reason why enterprises are stuck with slow/unstable software unless they hire a bunch of DBA’s to take their database systems seriously. You’d be surprised how many companies just have one or two MySQL or MSSQL instances to work with, and not more than one DBA designated to managing them. Otherwise, they’d have the IT wherewithal and budget to write basically your product in-house from the ground-up with their expected load in-mind from the start.

      There’s not a lot of middle-ground in IT today–either your business model is data-heavy and you already have the ability to write the software with the right DBMS you need/already have, or your electronic data and IT needs are growing and scalability needs to be built-in.

      I hope I made it a little bit clearer why starting scalable would be amazing for start-ups. It’s just not realistic at all right now because of cost–that’s where academia comes in. Lowering the entry barriers of running ridiculously-sized databases would eliminate so many problems in getting applications started that it would seriously make the world a much better place for software/IT innovation, not just for Facebook. They’re totally invested with MySQL, and hey, they can actually make it work for them. More power to em.

      Share
  10. I’ve worked as a DBA for some 25 years, starting out working on Unify, as data volumes grew we had to convert to Oracle. I spent 8 years at Sun Microsystem where I was exposed to MySQL. I’m currently a MySQL DBA for IBM and think MySQL is very scalable IF it is tuned properly. I think if you have performance issues, and exhausted all your options, then I’d look into the alternatives listed in the article. But remember, propriety code may not work good with Database Upgrades, and I have seen a lot of Applications break when an upgrade occurs. Oracle was on the ball when they bought all those middleware companies in the 90′s, oh and now they have MySQL.

    Share
  11. Mixed reactions as always to Stonebreaker’s vision, which is often clouded by the fact that he always thinks his latest product is better than all others for any reason. I like the ego, but it doesn’t serve the greater good in cases like this. If I update my LinkedIn recommendations, and it takes some of their servers a little longer to show my changes than others, is it really a big problem? No. And actually, in order to achieve complete consistency across a network of that size would require a huge investment in hardware, code and maintenance which eats away at the business’ bottom lines. Is that always necessary? No. Therefore, just using this one well known example, it’s easy to show there are in fact excellent reasons and use cases for CAP or eventual consistency, over ACID. Stonebreaker is correct, that many notable “web scale” players today are holding their systems together with chewing gum and bandaids, and will eventually need some level of rewrite… but to suggest that only an ACID database (like the one he’s selling) is the only solution, does not serve the market very well. In any case, developers are smart. If they need ACID, they’ll get it. If they need key-value lookups, they’ll do that. Many-to-many relationships (like Facebook and LinkedIn)? Graph databases might help solve those problems – or even run alongside another data store or component in a polyglot application architecture approach (which is also more common today). Anyway… Stonebreaker is a really REALLY smart technologist, but a bit egoistic in his own marketing, and definitely always thinks his baby is the most beautiful.

    Share
  12. I smell a little dislike for MySQL. Certainly there are challenges with the growth rate. I can’t help but think this is biased without real data to support the claim. I can’t imagine any database engine would be smooth sailing with the sheer volume of data that Facebook is pushing. Just my thoughts.

    Share
    1. So it is now against the law to dislike MySQl? When a database engine can’t even finish a simple single nested query on 20k rows but must be killed after nine minutes, while another database engine on the same machine gives me a result set in three quarters of a second, you damn right I dislike MySQL. For the right type of workload MySQL is stupid fast. For anything else it’s so over-rated as to be laughable.

      Share
      1. lol so you cant optimize your query and you blame MySQL? I mean your statement doesn’t even include which storage engine, average row size, it just sounds like a kid whining… Oh web “developers” these days…

        Share
  13. Interesting perspective, but I think that sql and traditional databases still have a place. http://mikemainguy.blogspot.com/2011/07/deciding-on-nosql-vs-rdms.html

    Share
  14. MySQL can be made to scale. It scales well to a point then you partition/shard. This introduces challenges, largely maintenance, but it isn’t insurmountable like the article implies. The choice of NewSQL companies seemed odd: Xeround (cloud service), Clustrix (MySQL emulation not 100% compatible, doesn’t work with MySQL tools), GenieDB (a messaging layer for databases), VoltDB (also not MySQL compatible). You might consider ScaleDB which is a MySQL pluggable storage engine, meaning it works with existing MySQL apps AND MySQL tools, while delivering elasticity, scalability and high-availability. In fact, it makes MySQL work like Oracle RAC.

    Share
  15. Unlikely Bret Taylor trapped.

    Maintaining ACID properties is focus of questions from Redmond’s SQL R&D architects attending http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup

    Share
  16. Derrick Harris Thursday, July 7, 2011

    Thanks for all the comments. As I acknowledge in the post, Stonebraker certainly has a vested interest in companies moving away from “old SQL.” But it’s not like he’s alone in looking to improve DB speed.

    We saw companies, including Oracle, build and buy in-memory databases and data grids several years ago, then there was was NoSQL, and now there’s NewSQL.

    I don’t think Stonebraker actually thinks any large sites will migrate from SQL at this point, though, but rather that new applications should be built on new versions of SQL designed with the web in mind. The problem, of course, is that few if any are FOSS right now.

    Re: Facebook, I don’t think anyone doubts its MySQL prowess, but it’s possible the company has outgrown MySQL and it takes way too much brainpower and complexity to help MySQL keep up with traffic growth. Look at Google, which eventually had to retire MapReduce and GFS. Facebook has been using HBase for a lot of new projects.

    Share
    1. Aigars Mahinovs Friday, July 8, 2011

      There are no ‘new versions of SQL’. There is either ACID-compliant storage (usually with SQL access) or eventually-convergent key-value storage (usually called noSQL). The ‘NewSQL’ touted here is a vaporware that claims abilities that are basically impossible according to information theory basics.

      Share
    2. This post makes claims about the efficiency of Facebook’s infrastructure but I see no response from them quoted. Did you attempt to get their view on these claims?

      Share
    3. What is your take on this?
      https://www.facebook.com/note.php?note_id=16121578919
      This source seems to conflict with your article.

      Share
    4. Yet Google is still using MySQL…

      Share
      1. Actually, Google uses BigTable for its most significant services: http://en.wikipedia.org/wiki/BigTable. BigTable is a proprietary database similar to some of the NoSQL offerings.

        Lookup CAP theorem. The SQL language favors Consistency and Partition tolerance, basically throwing Availability out the window. This is unacceptable for applications the size of Facebook, but Zuckerberg didn’t really consider this when he started writing the code. To make it scale, Facebook eventually had to implement their own sharding layer, which trades Consistency for Availability, but to do this you have to stop using joins (which basically makes it NOT a relational database anymore).

        I agree with the author in that I would hate to be the developer that has to keep all this stuff organized. Facebook is doing at the application level what COULD have been handed at the database level if they were built on a backend that was designed from the start for availability. The selection of “alternatives” was pretty bogus, but the essence of the article is mostly correct.

        Share
  17. Sounds like the author has also been dropping ACID, along with NoSQl,
    no wonder it’s considered a “movement”.

    Share
  18. Yes, let’s keep everything in memory all the time, because we all know that memory costs nothing and is in infinite supply. I guess facebook just needs to copy everything to hash tables.

    Share
  19. postgresql actually scales much better than mysql, since it was designed to do so from the beginning – also since it was designed to be a full relational database through and through rather than emulate some aspects, it is able to be generally more efficient as a result.

    Share
    1. I’m really surprised it took this long in the comments to see someone mention postgresql.

      Share
  20. The way FB uses MySQL, they could replace it with any RDBMS and be just fine. Memcached is the shining star behind FB and it’s nothing more than a distributed memory-resident hashtable so I would say all the magic is in the app layer and that’s fine!

    Scalable SQL? We have that already it’s called Postgre in the hands of anyone who knows what they’re doing. Flame on Mr. Harris

    Share
  21. Funniest thing I’ve read in sometime – keep up the epic satire!

    Share
  22. I have to agree with Dennis. This was definitely an eye-catching article, but I spent the entire time looking at how Facebook’s fate is worst than death. This article said little in that regard. I agree with pretty much all the user’s comments before me. I’m open to a better solution, but frankly, I’m upset by this article since it’s minutes of my life I’ll never get back.

    Share
  23. Interesting

    Share
    1. Conveniently this Stonebraker guy is selling a non-relational database.

      http://voltdb.com/

      This quote is retarded.

      In Stonebraker’s opinion, “old SQL (as he calls it) is good for nothing” and needs to be “sent to the home for retired software.”

      I think Stonebraker needs to put down the koolaid and realize that you use the right tool for the job at hand.

      It’s time for FB to reconsider how they do things completely but it surely is not the case for all but a handful of companies like Google, M$, etc.

      Share
    2. I concur, I realize that the world runs on ‘oldSQL’ and will continue to do so, I just thought it interesting because it seems that so many cling to the tool they know rather than the right one for the job regardless of who makes it.

      Share
    3. True. I’m really surprised how far relational databases can scale considering they were never designed to scale to the level that they are today. Granted they have been hacked to in many ways to scale, especially in the case of FB, but the basic principles are still intact.

      Share
    4. Looks like VoltDB is free and open-source.

      Share
      1. VoltDB is only free for development. If you want to use it in production, get ready to spend BIG bucks.

        Share
      2. VoltDB is available under two licenses. VoltDB Community is distributed under the GPL3 license; VoltDB Enterprise is distributed under a commercial license. You can use either version of the product for development, test and production deployments. If you choose to deploy your application using VoltDB Community, you can also optionally purchase a VoltDB support subscription (support is included with VoltDB Enterprise), although you can certainly run VoltDB Community based applications in production without a support subscription.

        The primary difference between the two products is management/monitoring consoles and related APIs – VoltDB Enterprise has them, VoltDB Community doesn’t. If you’d prefer to use your own runtime management tools (some users do prefer this approach), then you can run your production applications on VoltDB Community.

        I hope this helps to clarify what’s available in the different versions of VoltDB.

        Share
  24. Yeah, what Dennis wrote.

    Oleg wrote “I believe FB has 4 DBAs, I know 1000 shops that have more Oracle DBAs.”
    Oleg you are as bad as Stonebreaker. Any basis for your comment? Facebook had more than 4 DBA’s giving presentations at the MySQL Conference.
    And you really know a 100 shops with more than 4 Oracle DBAs? List 10% of the number…

    Chris: Craigslist uses MySQL for their front-end stuff but they migrated their archiving system to a NoSQL solution.

    Share
  25. OMG that is funny. Is this _seriously_ an article about how MySQL doesn’t scale, and Facebbook… *facebook*…. 750Million users FB – is used as the evidence for that? Really??

    I’d love to see the piss-ant, 500 users systems built by this clown where you charge a corporation several million dollars for “scalability”. I’ll just bet you he has done that.

    Share
  26. Interesting… Mark If you required my help then do let me know.

    Share
  27. If they need to seriously look at the best alternatives, then, check out the latest release of R:BASE eXtreme 9.1 (64) http://www.rbase.com
    The best kept secret of a true relational database for over 28 years!

    Share
  28. Interesting take on an unfamiliar Facebook’s MySQL deployment.

    Replication is one of the top features of MySQL for scale-out, redundancy and disaster recovery. Single instance or single image database performance is not enough. Reliability and high availability is important to running business critical applications, but seldom discussed.

    Share
  29. Terance Cambel Thursday, July 7, 2011

    FB should consider moving to IBM Informix v 11.7
    http://www.informix.com

    Share
  30. Facebook (or anyone else) need not necessarily rewrite their entire application to get the supposed benefits of VoltDB. MySQL (NDB) Cluster has been a scalable, partitioned, redundant, ACID, in-memory, hybrid SQL/NoSQL database for about a decade now.

    Share
    1. MySQL Cluster (AKA NDB) has some challenges though: joins, range scans, aggregates, and it is not supported in virtualized environments. But yes, it avoids a rewrite to VoltDB.

      Share
      1. Actually, the JOIN performance issues are largely fixed in ndb-7.2. In many cases 20-40x performance improvement over previous versions. And some cases a small NDB cluster even beats a single server instance of InnoDB. It does range scans, in fact Multi-range reads, http://www.clusterdb.com/mysql-cluster/mysql-cluster-multi-range-read-using-ndb-api/, aggregates I will give you… but it support in virtualized environments is possible now with better resilience against disk IO latency in ndb-7.1.10. Most of the problems you foresee are either completely quashed or are in the process of being.

        Share
    2. Recent changes to InnoDB/MySQL made by Oracle, MariaDB and Percona are pretty scalable too. Of course NDB continues to improve in leaps and bounds.
      That article is pretty flawed in many ways unfortunately and appears to be badly researched.

      Share
    3. Recent changes to InnoDB/MySQL made by Oracle, MariaDB and Percona are pretty scalable too. Of course NDB continues to improve in leaps and bounds.
      That article is pretty flawed in many ways unfortunately and appears to be badly researched.

      Share
    4. Isn’t Stonebraker just saying that instead of making MySQL do what it was not originally intended to (and spend a lot of dev hours/$$$ doing so), a newer and use-case specific solution might be better?

      Share
      1. How dare you fail to misinterpret what he was saying!

        Share
  31. I was under the impression that Facebook is using Cassandra, a NoSQL developed by Facebook itself (inspired by Amazon Dynamo infrastructure and Google BigTable data model), now an Apache project as its datastore. Quite surprised to know that a massive web application like Facebook still uses MySQL !!!

    Share
  32. use ms access. nuf said…

    Share
  33. Manuel Cantu Thursday, July 7, 2011

    Why not switch to Oracle Database? With GoldenGate they can switch to one or several Oracle Database instances running on 1 or 2 Exadata Machines depending on their needs. Then you add Times Ten for in-memory database cache.

    Share
    1. FB would need another round of VC financing to be able to afford Oracle’s licensing fees!

      Share
  34. looks like a lot of FB employees have lots of free time to comment on this blog. The negativity is staggering, get to work guys, whats the next ‘awesome’ facebook announcement ?

    Share
    1. Do you have a job? How about you get back to work?

      Share
  35. They need to seriously look at the best alternatives. Check out the latest release of R:BASE eXtreme 9.1 (64) http://www.rbase.com.

    Share
    1. R:BASE? Seriously? Are you crazy???

      Share
  36. Doomsday prophecy for Facebook because they use MySQL. NoSQL a buzz and NewSQL the solution to save the world? How easy to say!
    I thought facebook is doing quite well with their infrastructure supporting 500M+ users with Apache hadoop cluster, Apache hive and
    RDBMS redistribution technology and …. MySQL!

    Share
  37. “Stonebraker says the social network’s predicament is all too common among web startups that start small and grow to epic proportions.”

    All too common for him, maybe he reads too much TechCrunch? Let’s see the evidence of all the startups that really need more than one database server, in other words, rocketing to 100s of millions of visits/month.

    Share
  38. Terry Lambert Friday, July 8, 2011

    I would have to say that for the front end, for things like status updates and so on, where the update propagation isn’t time sensitive, so long as it is (eventually) time-ordered, you could do worse than an OLTP system.

    I would probably use something like IBM’s MQSeries, or one of the other heavy-weights, rather than something new written in an interpreted language and backed only by fragile memory; instead, I’d probably do exactly what they are doing, and use a sharded SQL database of some kind for the persistent storage.

    Memory is a hard thing to trust, which is something VATech learned the hard way when they built their XServe based supercomputer with non-ECC memory, and had to divide up the calculations they were running and run them multiple times and “vote”. This was simply due to that amount of memory getting in the range of where cosmic rays start to become important to data-(non)integrity.

    Still, it’s interesting that by front-ending FaceBook with OLTP, you could probably resolve most of the data coherency issues fairly trivially by accepting a somewhat longer propagation pipeline delay, without having to resort to SQL transaction replays, so you could get a somewhat cleaner solution than you might get otherwise. This would probably work well for any other RSS-style application as well.

    Share
  39. Boris Juric Friday, July 8, 2011

    You might want to look into the rumor that Facebook is sacrificing smaller markets in order to save resources for important markets. Here’s the case: it’s been 1 month since Facebook bot stopped crawling websites on whole Croatian .hr TLD. It doesn’t read og tags anymore and the only visible thing on shared/liked links is URL. You can test it by trying to share anything from .hr domain, or by using lint tool. This could be happening on other domains also, without anyone important noticing.

    Share
  40. OK.. with the 750M user base and unknown number (at least for me) of staff (highly valued), FB and it’s users (did we hear any complaints about scalability?), and mySQL (which is owned by the top RDBMS company Oracle) couldn’t figure out… this guy figured out! Looks like a 2012 end of the world to me! Give FB, mySQL, and the ardent FB users a break and find a different way of selling whatever you are selling dude. Why can’t you write a post about how it’s not that easy (given the open source nature) to scale mySQL (not that you can’t scale… but scaling the resources to find the brains that can scale) than trying to ridicule a successful ecosystem that figured out how to scale?

    Share
  41. I’m a great fan of Michael Stonebraker. He’s done a huge amount for the science of databases.

    When he invented Postgres he got many things exactly right. One of those things was making the source code open, allowing it to be developed into a increasingly high quality product over the following 20 years. So when he explains how MySQL is bad, he is in some ways also dissing his own previous ideas.

    The problem is that by making Postgres open Mr. Stonebraker no longer makes any money from the project.

    Taking those points together, I’m more inclined to believe that he knew what he was talking about the first time, but now wishes to gloss over that in order to make even more money.

    We don’t need to use new products to take advantage of new ideas. PostgreSQL is just as innovative now as it was 20 years ago, and we are adding new features at an incredible rate. Innovation and maturity makes the best solution.

    Open source doesn’t mean it’s good, but lack of a venture capital funded marketing budget doesn’t mean its no longer valid. If anything it shows we’ve entered a phase of efficiency where less hype is needed to sustain a growing user base.

    Share
    1. Every product has its pros & cons. I like Postgres in general, but Postgres can barely do a count(*) which takes forever on large tables because it has to always do a full table scan. This is because they implemented MVCC in a brain dead way. Trying counting 100 million + row tables and you will scratch your head. It wasn’t built for massive size databases. Postgres doesn’t do much in parallel either like creating indexes or parallel queries. Trying indexing a 300 million row table sometime. Also, it doesn’t really use multiple cores. It is one connection per core, but a single connection can’t use more than one core. I got 24 cores and Postgres can use one for a big operation. The newer enterprise features like hot standby and streaming replication are nice, but Oracle has had these since 9i/10g. Postgres is definitely getting there and Enterprisedb’s version with InfiniteCache is good, but every product has pros and cons like I said. Nothing is perfect. All of today’s databases are great at shoving data in, but none of them have any real archiving features to get data out and archive it in a nice way. Dumping a table is not archiving. Also, once you get into the terabyte range, most of the tools and utilities of all modern databases just outright break down.

      Share
  42. If you must change use the best database product which is Oracle. Seems to be the correct path.

    Share
  43. The problem with RDBMSs is not performance, it’s availability. When I design internet services that always should be available I want to use a persistence technology with support for active-active clustering with no SPOF (like Cassandra).
    I don’t know of a RDBMS with that functionality. It can be simulated with big $, but why throw away the money?

    / Jonas

    Share
    1. MySQL Cluster (NDB) is one such database. Seriously, all nodes active with 0 SPOFs.

      Share
      1. No SPOFs isn’t good enough. We had no single point of failures in one of our application’s architecture, but recently had two of our core switches fail simultaneously. Both switches were from the same vendor running the same firmware and had the same bug. All of our virtual machines lost connectivity to the SAN. We had dual nics, dual routes, dual everything including core switches, but sometimes, shit happens.

        Share
  44. You cannot even imagine how wrong you are.

    Share
  45. NASDAQ runs on MS SQL Server. I hesitate to guess but would suspect the NASDAQ deployment and the processing demands far exceed most deployments out there today. SQL is not the issue, planning and costs are. Startups typically deploy opensource then get caught at the acquisition stage or when the technology maxs out…this is when they find themselves having to move to the big boys. Also, consider that the R&D money put in to MS Sql Server, Oracle etc far and away exceeds MYSQL and the others mentioned in this article, so would you bet your company on something other than these giants? Use your favourite search engine to learn more about industrial strength databases.

    Share
  46. If we’re talking startups, what really counts in all this is that for every FB there are 1000′s that won’t make it. Stonebraker’s big fallacy is the assumption that those 1000′s can afford to pay big bucks for their software. Fact is, they can’t, so they must use whatever open source is available. What Stonebraker and his ilk need to do is change their pricing model. Maybe Datameer have the right idea.

    Share
  47. Simply because facebook have the best minds

    Share
  48. Nice advertising piece for Stoneblahblahblah.

    This issue probably has Facebook knocking down his door…

    This piece is like watching FAUX News fair and balanced! Are you guys own by Murdoch?

    Share
  49. If mySQL is good enough for the current state of facebook… it is good enough for 99% for people who use databases out there…

    Share
  50. This article reminds me the conversation that I had with Stonebraker soon Informix acquired Illustra. He said that RDBMSs were history and Object Relational was the way to go – there would have been nothing else in 2-3 years time.

    It was 1996.

    Share
  51. He sounds like a snake oil salesman.

    Share
  52. In one year it would take Facebook to rewrite everything (and one year is optimistic) the hardware will double in power. Meaning they can just buy more servers with more memory and more multicore processors, and just crunch MySQL.

    Even with today’s technology, if Facebook keeps its steady grow to cover EVERYONE on the planet in about 5-10 years, they could just buy more of TODAY’S hardware and keep the site running.

    Share
  53. I can’t believe someone took down my comment (oh well, it was pointing to a big response on my blog, otoh, someone is as good at looking up people as they are doing research for articles) – http://dom.as/2011/07/08/stonebraker-trapped/

    I would hope I’m allowed to have that answer persist in my space ;-)

    Share
  54. Well honestly, having this problem is a non-problem to begin with. I mean: when you do, you’re still better off then without this problem.

    Share
  55. michael nittmann Friday, July 8, 2011

    I have been recommending clients to go to postgres for years….

    Share
  56. Martin Wondergem Friday, July 8, 2011

    In other news, Stonebraker recommends getting rid of hammers saying: “old hammers (as he calls it) are good for nothing” and need to be “sent to the home for retired tools.”

    When building your next picnic table, Stonebraker recommends starting with concrete reinforced gauge 3 steel, just in case you end up with 750 million users at your party.

    Share
    1. Rafi Jacoby Friday, July 8, 2011

      You haven’t told us what kind of hammer to use for building that table. Unobtainium?

      Share
  57. dave watson Friday, July 8, 2011

    Comment is made Craiglist is on MySQL. But actually they migrated to MongoDB.

    The SQL vs NoSQL classification is nonsense. Instead talk about different classes of DBMS: relational, document, graph, key value pair etc.

    Also Mr Stonebraker statement about FB is misinformed. Just listen to their own people give talks about their infrastructure and use of technologies like Hive, Hbase, HDFS etc.

    Share
    1. Craigslist are still a huge MySQL shop – they only migrated their archives to Mongo. The remainder is on MySQL

      Share
      1. Yes but others have done more complete migrations. Take a look at case studies such as Examiner.com and Guardian Newspaper. There is certainly a trend. Own tests for federal government agency comparing MongoDB with 2 leading commercial RDBMS (can’t name for legal reasons) showed considerable advantages in terms of flexibility of data access, productivity of development and performance.

        Share
  58. Facebook should ask SAP to use the SAP New InMemDb and thier realtime analitical appliance, HANA… 10,000 times the performce of the average database……

    Share
  59. Faizan Javed Friday, July 8, 2011

    Stonebraker’s latest NewSQL product VoltDB is also open-source. It is a speedy in-memory database which is ACID compliant, but relies on stored procedures to avoid round-trips and reduce network traffic. It is not as partition tolerant as Cassandra, but I believe is at least able to scale-out as much as MySQL but with far better performance and provides ACID guarantees. The main sticking point with VoltDB is that it stores all data in RAM – this may not be the best approach for a huge site such as Facebook considering that RAM is cheap but not that cheap.

    But to me there is potential – maybe a “next-gen” VoltDB which can get around this in-memory limitation might be the perfect product. I would prefer to have a database product which is a blazing fast scalable version of MySQL instead of spending thousands of developer hours making MySQL do what it does not out-of-the-box.

    Share
    1. I think you’re missing the fact that MySQL has had a solution that is very similar to VoltDB which has been used by high scale, high performance and high availability Telco environments for a decade. It is ACID, in-memory by default, but also has the ability to have some or all non-indexed columns stored on disk. This, of course, allows for a much larger data set than VoltDB. It also supports a number of direct NoSQL access connectors to avoid the overhead of mysqld. Also, in 7.2 it has incredibly (20-40x) improved support for JOINs which have traditionally been problematic in distributed or NoSQL databases.

      See: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-overview.html

      Share
  60. Great post on FB database shortcomings, which is a main reason FB doesn’t offer ability to edit Status and Comments. Instead, you need to use archaic methods such as delete and repost.

    Share
    1. You have no idea what you’re talking about.

      Share
  61. So wait…should I throw out all my betamax tapes? Shit.

    Share
  62. Terry Greenlaw Friday, July 8, 2011

    Stonebraker’s trolling for business has become quite irritating. While the rest of the world has come to the realization that transactional technology only works when it spans all aspects of your application, not just the database tier, he’s still clinging to the 1970s. Of course, he still thinks SQL makes sense, too, because the relational database is the center of the universe and deserves its own semi-functional, buffer-loving, designed for ad-hoc query language. Michael, step into the light, step into this century, and stop trying to pawn the snake oil !

    Share
  63. To be clear – I work for another company that builds a newSQL solution. It’s name and product are irrelevant here – as I don’t do this kind of product pitches.
    This article is a joke, and it’s a shame that such a product pitch is given here. Only the last paragraph mentions the interests Dr. Stonebraker has, and you can’t present yourself as an academic figure when you just want to push your product. It’s misleading and unethical and it’s a shame gigaom gave Dr. Stonebraker this publicity.
    The article is filled with in inaccuracies, and the competition he presents is such that is most fitting for him.
    Shameful.

    Share
  64. systems guy here. no, not a database expert, but I’ve built my own web presence. You can talk all you want about software issues, strengths and weaknesses. At the end of the day, you can not deny that facebook is trying to create its own “decentralized network” or its own “little internet”. This is all part of the evolution process. The internet was built to be decentralized, not centralized. Simply put, it will eventually implode under its own weight. But me thinks all the other social networking clients, like diaspora, or appleseed, friendika, etc will take facebook down to a sizable web service. again. natural evolution: decentralized….

    Share
  65. IBM has build systems that has SQL built into the core OS that a perfectly suited for millions of transactions a second. Banks and large corporations uses these systems every day. They are midrange computers. Why do people always think inside the box. Talk to the only tech company that has been around for a hundred years.

    Share
  66. Is this the only way to solve this problem? At the end, even the RDBMS has to change the way it has been serving for 3to4 decade.

    Share
  67. “Until, that is, it, too, becomes a relic of the Web 3.0 era.”

    The author’s use, or rather overuse, as it were, of the comma, is, I feel, commendable.

    Share
  68. Too many companies get sucked into the sales-guy promoting fear. A big smile with shiny glasses is a Technology evangelist and once you get sucked in, you’ll pay anything to feel safer.

    Facebook can use any technology out there and make it work, that is the difference – who cares what technology they use, it will work. It’s about walking the walk. Looks like Stonebraker needs to build something similar to a Facebook… then talk the talk.

    Share
  69. Nonsense. MySQL has an enterprise cluster product specifically built for massive throughput and redundancy. The problem is probably that they are using something very table type specific like innodb or myisam with fulltext search keys that the ndbcluster engine can’t handle. MySQL Cluster is built to handle telco level realtime transactions and does so for some really big sites… not easy to implement but extremely powerful. They could have 100 self replicating clusters which would be built with the sole purpose of expanding outward. Just cause you’re big doesn’t mean you are smart.

    Share
  70. Interesting to know this.

    Share
  71. Mr Harris: When you attribute statements like “Stonebraker said the problem with MySQL and other SQL databases is that they consume too many resources for overhead tasks (e.g., maintaining ACID compliance and handling multithreading) and relatively few on actually finding and serving data.” include references to the hard proof. Pushing logic from the repository to the application does not make anything faster or slower. Cumulative work is just shuffled around. Moving logic from server to client can improve the perceived speed as you have effectively added massively to CPU and RAM capacity. So, if you have a relational data model the use SQL. If your data model is different then use a custom repository. (The different NoSQL repositories fill this custom need. Just as, for example, Focus fitted the custom need for hierarchical data in the 80s!) As to web scale issues, the solutions are all the same no matter how your repository models data.

    Share
  72. Derrick Harris Saturday, July 9, 2011

    I just want to reiterate that Stonebraker wasn’t insulting Facebook. Rather, he was pointing out the incredible amount of work and skill required to make MySQL fit its purposes.

    Is VoltDB the answer? Maybe, maybe not. FB’s transactional DB might be too big to run completely in-memory. Also, FB has the engineering talent and is so far along that it might be easier maintaining the status quo rather than rearchitecting completely.

    But it’s not too crazy to suggest that life would be easier if it was built initially on something designed for its purposes.

    So, if there is a class of DBs available — VoltDB, Clustrix, ScaleBase, or whatever — that’s designed to handle the transaction and data volumes of today’s web, isn’t it prudent to at least give them a look? Especially if you’re building something from scratch?

    Share
    1. Derrick,

      I don’t think assessment like ‘fate worse than death’ is a fair one for analytical article, if no insider knowledge is presented. There’re various presentations available online about state of FB database deployment – and what kind of problems are there and how they’re solved.

      It is difficult to define today’s web, but there’re not enough public numbers about big boys and their datasets nowadays, and to get your place in the web you have to do something smarter and more efficiently than others.

      You ‘class of DBs available’ without actually looking more at them. VoltDB is in-memory, so ‘data volumes of today’s web’ is already somewhat odd qualifier for it.

      ScaleBase is “transparent sharding” with regular MySQL servers in the back. I’d bet that implementing sharding in the application gives way more visibility of data access costs to developers (so that cross-shard workloads are well understood and optimized). Same applies to VoltDB, I guess ;-)

      Does NimbusDB product exist yet, I miss that part on their website.

      I’m yet to understand how GenieDB provides ‘wide geography multi master’ – for now it talks about in-memory consistency layer with only one coordinator for a range of data, it seems.

      Xeround is up to 50GB, not much of a web scale.

      etc

      I can do lots of hand waving and pick my random list of products that are future of the web, and make headlines, but that may not make responsible reporting or analysis.

      Anyway, whatever solutions people propose, all those solutions will still need to be partitioned at large enough scales. If not at a terabyte scale, then at a petabyte. If not at a petabyte, then at exabyte. If not at exabyte, then at zettabyte, … ;-)

      Share
      1. BarryVMorris Tuesday, July 12, 2011

        NimbusDB is in Beta release.

        - SQL, ACID
        - elastically scalable (add/delete nodes dynamically)
        - multi-tenant (DB’s can share machines arbitrarily)
        - no single point of failure
        - resilient to node failure
        - active/active geo-distributed
        - no sharding or partitioning
        - very fast on single transaction node, scaling linearly in tests to date
        -redundant storage nodes (as many live copies of the DB as you want)
        - very low DBA requirement

        We are always keen to talk to people that are facing SQL-in-the-cloud challenges.

        Barry Morris, NimbusDB Inc.
        - no masters, slaves or supervisors (100% peer to peer)
        - free for small systems

        Share
    2. If your point was to suggest that it’s appropriate in some instances to use a specific tool for a specific job (“built initially on something designed for its purpose”), then it’s irresponsible to include quotes like, “old SQL (as he calls it) is good for nothing” (Stonebreaker). Nothing in your original article implied anything about it being a decision between SQL and NewSQL based on project requirements. It was all posed in the form of “New” SQL being a replacement for “Old” SQL.

      Share
    3. FB doesn’t need transactions. It has a window of recent posts, anything other than that is “too old,” and the amount that is too old doesn’t matter, sometimes you see nothing at all. As a database guy, to me the data is everything. FB data is transient, it is not important enough to worry about. So it, in fact, doesn’t actually need ACID, it doesn’t actually need a database for anything besides user info. Any CS student could write a pointer list to unix style /etc/passwd to scale to a billion users, even dumbass java programmers. All the stuff users use is cached.

      What it needs is a better front end, but that’s another story.

      Share
  73. When databases were first invented hardware was incredibly expensive so they were designed to be able to dynamically grow with the users needs to minimise the hardware cost.
    The bigger the database is the more difficult it is for the database to be able to grow.
    Hardware is now cheap and going to become virtually free.
    There is no longer any need for databases to be able to grow. Database structures can be fixed.

    Share
  74. Note: this is from someone who a) loathes Facebook and b) loves PostgreSQL.

    Derrick, shame on you for writing this piece of trash. There are enough people in the world spewing unsubstantiated self-serving garbage without your help. What’s worse is to read your pathetic rejoinders in the comment thread.

    If there was a killfile for the web, I’d surely add you to it.

    Share
  75. Comments here really explain why this is an unfortunate statement.

    http://news.slashdot.org/story/11/07/09/150211/Sony-Announces-End-For-MiniDisc-Walkman?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29&utm_content=Google+Reader

    While relational DBs are really not the future,
    1) Facebook already has achieved scalability! The fact show that it works very very well. Their biggest concerns was actually PHP speed, not DB bottleneck.
    2) In any application with an architecture, the DB is abstracted away as a repository service, and you mostly talk to the cache.
    3) Architecture enables you to replace the implementation behind an interface easily, i.e. you can switch to any other repository/DB without “rewriting everything”.

    Share
  76. Man who created MySQL is working on MariaDB at the moment with a good group. Saw this on Finnish news, not sure what it really is.

    Share
  77. I was wondering why facebook is slow while fetching data.
    I got the answer now, its MySQL. Its unimaginable how such a big and successful application is running on MySQL. Every body knows the limitations of MySQL. After all it’s free.
    I would also like to add that we should be careful in using facebook and always keep a backup of data on facebook account .

    Share
  78. You are clueless and haven’t pinpointed a specific problem with MySQL at Facebook. Have you read how Facebook is actually using MySQL? Facebook has a massive Memcached cluster that caches over 25tb of data. The whole site is practically cached. MySQL is not a bottleneck and never has been. As for outdated technologies, what does VoltDB run on? Linux? Based on Unix? 30 years old? By your mentality, we have to rewrite Linux, TCP/IP, the database and everything else in between.

    Share
  79. Yeah, obviously this guy has experience with way heavier and more advanced setups than that of Facebook. VoltDB has really been in the foreground of the whole development of database scalability for a long time. It was also available at the time Facebook made it’s choices. Furthermore it has really proven that it can handle the webs biggest applications under real load and real scenarios … Or maybe it’s just ordinary internet trolling …

    Share
  80. Typical academic. Yeah it’s not perfect. But Facebook operates in the real world and the availability, price, etc may be what they do want…

    Share
  81. Nathar Leichoz Saturday, July 9, 2011

    The weakest link in Facebook’s architecture is not MySql that’s for sure.

    Share
  82. ThereIsNoPlanB Sunday, July 10, 2011

    OMG, does this mean Larry Ellison is going to buy Facebook?

    Share
  83. The problem with MySQL is phpMyAdmin, the most commonly used backup program. It stops backing up data at around 12 or 13 megabytes, which is a real headache if you want to have a very large database.

    Share
    1. WTF? what a weird comment!
      phpMyAdmin is not a backup program, although it can. It’s useful for sites that don’t have an ssh shell or (tunneled) port access.
      I suspect the script timeout in php.ini is set way too low.

      If you have a website of any importance, then you must also have ssh access for all the servers.

      mysqldump is THE backup program, takes a minute or two to backup 5Gb of social network data every day from a slave, then compress and compare with previous day’s backups and link yesterday’s backup to today’s if any tables have not changed.

      Share
  84. mySQL is open source. OPEN. OPEN. OPEN!!! Wouldn’t it just be more straightforward to alter mySQL to meet its needs? Of course it would. what ever is wrong with their mySQL can be fixed by fiddling with the mySQL source. With that said, doing DB processing for 600M users on <13K machines should be considered efficient and relatively troublefree. That's 45K users per db server. While there is room for improvement, that's not too shabby.

    Share
  85. The best Dbase I’ve ever worked with is ms-access. It is super scalable and super fast. Also, I could run the entire fb site on a laptop and it would still out perform any of your systems or other dbases out there. No need for multiple Dbase servers or memcached servers.

    My system can also run wireless connections from here to the moon, and I’ve just signed a contract with NASA to automate 98.67849% of the international space station using my new integrated crystal microchip processors. I can store more data in one of my crystal chips than all of fb, combined!

    Share
    1. Forrest Gump Monday, July 11, 2011

      I used MySQL to organize wav files of my various farts. It worked well for me. I encourage Facebook to allow users to upload wav files of their farts too!

      Share
  86. I haven’t read a more astute observation of the situation China is currently facing with it’s inevitable financial meltdown. Stonebraker is actually a brilliant Sino economist and doesn’t even know it! The same ACID trip principals certainly apply.

    Share
  87. Err, there are valid “NoSQL” solutions that have been around forever. PICK(marketed under names like Reality), for instance, is extremely fast and very reliable. It’s also very old and has very little overhead.

    Share
  88. From the article: “Facebook is operating a huge, complex MySQL implementation equivalent to ‘a fate worse than death,’ and the only way out is ‘bite the bullet and rewrite everything.’”

    This is *so* disconnected from reality.

    I work for a $20 billion dollar company, which is growing 20% year over year, and run entirely on MySQL.

    Is this ideal? No. Do we have problems? Yes. But would consider rewrite everything? Never! *That* would be a fate worse than death.

    ” There’s no guarantee they’re the solution for Facebook’s MySQL woes — the complexity of Facebook’s architecture and the company’s penchant for open source being among the reasons”

    The article talks about open source as if it was derogatory.

    I can tell you that my company wouldn’t be possible without several open source projects (OS, database, programming language, etc).

    Heck: great companies like Facebook, Twitter, and Google, wouldn’t exist without open source software.

    Now they are huge companies, and they could afford proprietary licenses; but they still choose free — not because of the cost, but because of the freedom.

    Share
  89. This article explains why so many times you cannot get on FB or you get thrown off…thx for the article, it helps to enlighten us non techies.

    Share
    1. lol….ur password is stolen girl!!!

      Share
  90. Hyun Jung Soh Monday, July 11, 2011

    How sad. :(

    Share
  91. Etienne Ayanes Monday, July 11, 2011

    There are lots of snippy little genii (at least in your own minds) here and too few bottles available with which to shut them up.

    Share
  92. Of course, Stonebraker is a businessman who will claim whatever he makes money with. But I hate this kind of dumb articles that have title and content of a single-person opinion.

    Share
  93. One difference is that Walmart, fed gov, stock exchanges, etc. that have large db with high transaction volumes have adopted a three tiered architecture, middleware, I.e. TUXEDO. But that’s so ‘old school’, plus not free nor open source, that it’s seldom considered. Too bad, it works well, there is a reason all the highest TPC benchmark results are still achieved using TUXEDO. I wonder what the future holds now that TUXEDO and MySQL are controlled by Oracle????

    Share
  94. Are transactions that important for facebook?

    Share
  95. So basically….

    A database pioneer, computer scientist… who developed Postgres and Ingres relational databases.. and an active involvement in the development of other types of databases….

    Against

    Others who build and work on database systems…. correcting him….

    Seems fair to me….

    Share
  96. From the start of this article it looked like Stonebreaker had no idea what he was talking about. His claim that SQL “is good for nothing” is ludicrous right on the face of it. He then further destroys his credibility by claiming that SQL skills “aren’t hard to come by”. Sure, you can find plenty of .NET or Java developers who can write syntactically correct, but horrible SQL code or who can design a poorly performing database (there are also some .NET and Java coders that are talented at SQL, so this isn’t a dig at them). The fact is though, even in this job market I get constant calls from recruiters because there is a shortage of *good* SQL developers.

    At the end of the article it all becomes clear – he’s just trying to sell his own product. A product which, despite him referring to it as “NewSQL” includes nothing that isn’t already being done out there with existing SQL engines.

    What a waste of an article (or should I say sales pitch).

    Share
  97. They should call it “SQLSequel”..

    Share
  98. 1. Engineers don’t understand data
    2. If Engineers could scale, scalability as a problem wouldn’t exist.
    3. VPs in-charge of data operations have glaring holes in their understanding of data I/O patterns Facebook needs.
    4. The rest is then misalignment of skills/toolsets/ combined with chaos + egos

    “We know the problem. We can’t do anything until the … ” screams “we don’t know the problem and don’t know how to fix it.”

    Share
  99. Apparently the so called NewSql/mysql expert who provided the information for this article, is all wrong about database issues at Facebook. They widely use cassandra database which is not relational for thier massive content. The system that Facebook innovated is far better than google’s and amazon’s technology.
    The author didn’t convince me on his assessment that NoSql will offer only little performance over RDBMS. He should go to Mount Everest and take a break before starting out his professional life.
    For webscale apps, sacrificing ACID properties provides huge performance benefits that cannot be achieved by RDBMS.

    Share
    1. Facebook does use Cassandra, but not as widely as you seem to think. The original use case for Cassandra was mail but, as http://www.facebook.com/note.php?note_id=454991608919 explains, they migrated that functionality to HBase several months ago. There’s a good description of their main storage architecture at http://www.prodromus.com/2011/01/27/what-database-does-facebook-use – mostly MySQL in a key/value style plus memcached. Both systems use Haystack (another Facebook invention) for images and other large objects.

      Facebook’s infrastructure, far from being better than that at Google or Amazon, is very similar and developed in parallel with those others. Cassandra, for example, combines the Dynamo distribution model (from Amazon) with the BigTable data model (from Google). HBase, part of Yahoo-derived Hadoop, also has roots at both companies, while Haystack shares many ideas with both Amazon S3 and OpenStack Storage. The primary argument for sacrificing ACID is not performance but tolerance of partitions – an argument most famously advanced by Eric Brewer, formerly of Yahoo and now of Amazon.

      While I find Stonebraker’s comments as misguided and contemptible, as you do, I don’t think factual errors and ad hominem attacks make that point very well. Please, study the systems you’re talking about a little before you make general pronouncements about their relative merit.

      Share
      1. jeff,
        To be honest with you, i read papers on big table and dynamo projects. Inside our company, we are also switching some our apps to use cassandra on an experimental basis. I didn’t like to make a very long post explaining all details in the comments.
        Every system has its limitations, but the technology that facebook is using currently, carried them to hundreds of millions of users. Obviously this must be best technology so far.

        Share
  100. It’s nice to bring NewSQL to the forefront as the database technology for handling the woes of the social networks limitations but FaceBook has been functioning just fine. There’s #1 rule for all computer systems that works even today and that is K.I.S.S (keep it simple stupid). If the system works then leave it be, meaning don’t mess with it.

    Share
    1. No thats only what stupid administrators think…this is why we have tons of bots running around now.

      Share
  101. For business savey but non-technical people you may want to uncover your unintentional, somewhat blanket statement, regarding open source: “…the company’s penchant for open source being among the reasons…”. This type of statement makes anyone look biased toward closed source software and makes it look like open source software as part of the problem which I’m sure you would agree is not true.

    Share
    1. > “…the company’s penchant for open source being among the reasons…”.
      I read this sentence as a reason why Facebook selected MySQL and not as a derision against open source software.

      Share
  102. So let say they have 10K low end machine, each costing 200$ a month. That mean each machine manage 75000 users and that it cost 0.0026$ per user per year. 24 million total cost for a company that is quoted at more than 80 billion.

    How the solution is unsuited?

    Share
  103. This is a very shoddy piece of writing. Far too many paragraphs are spend presenting MySQL / “OldSQL” as the problem with little to no substance as to why. Then, finally, after 90% of readers have probably already grown bored and gone away the reader is introduced to “NoSQL” and “NewSQL”. Then the article ends leaving the reader with no idea what “NoSQL” and “NewSQL” are except a vague notion that “NoSQL” was an attempt to get away from “OldSQL” and “NewSQL” solves everything.

    Share
  104. As soon as someone trots out the term “Webscale”, they lose all credibility in my eyes. It’s such a pointless term. It’s even worse than “NoSQL”, which as troll-marketing terms go is pretty bad.

    Share
  105. The bottom line is that relational databases have a tremendous amount of overhead consumed by ‘keys’. They are fine for data that can be arranged in specific fields, such as a phone book or accounting data. In other words, the relations between objects (fields) are predetermined and the code takes advantage of this.

    Further, the maintenance costs for ‘tuning’ is huge.

    However, they fall apart when there are non-predetermined associations and the database is ‘navigated’, then the size of the database ‘explodes’ with increasing amount of data and ‘links’ are randomly created by users. This is the paradigm that Facebook uses. Examples of links in Facebook are ‘friend of’, ‘likes’, etc. A robust object-oriented data base is far more suitable for sociability and transaction processing speed. An additional advantage of the OODB-based systems is that code doesn’t have to be rewritten to add object classes and attributes.

    One example of overcoming this conundrum for IBM DB2 is to use an OODB as the front-end transaction processor and to update the DB2 in the background. See IBM paper http://www.redbooks.ibm.com/redbooks/pdfs/sg246561.pdf describing how DB2 uses the Versant OODB to improve performance. Also see http://www.versant.com/index.aspx. There are competitive OODBs around, but I personally know that this one has been around for at least 20 years and have used and customized user applications for a sophisticated system engineering tool.

    Share
  106. Yeah, I just had to answer Stonebraker claims on Facebook. So I join the fiction club too (and disclosure – I work for a competing newSQL company) – and say what FaceBook DBAs could have answered this interview – see here

    Share
  107. BarryVMorris Tuesday, July 12, 2011

    Caveat: I represent NimbusDB, a NewSQL vendor.

    That OldSQL has failed for web-facing applications is self-evident. Every substantial web-facing application has had to supplement or replace their MySQL, SQL Server, ORACLE etc systems, in addition to caching, sharding, denormalizing and in some cases re-engineering parts of the database system. At NimbusDB we talk to people every day that describe this OldSQL pain.

    The case Mike Stonebraker makes is that the problem is not inherent to SQL but to the 30 year old internal architecture that all of these database systems use. There is no theoretical reason for SQL/ACID not to scale, and there are NewSQL products that provide existence proof of the point.

    Would Facebook and others be in a better place had they started with a SQL database that goes faster when you add nodes to a live database, and is resilient to node or datacenter failure? Obviously yes.

    SQL and ACID do scale out on commodity machines; historical implementations do not.

    Share
  108. These seem like nice problems to have, regardless of what system(s) they are using.

    Share
  109. a humble engineer Friday, July 15, 2011

    I have a lot of respect for Michael Stonebraker as a computer scientist. To give him credit, when credit is due, he has made indisputable contributions to field. Having met him in person, I have to say that he is not just a great computer scientist, but a great guy overall.
    Whether Facebook, or any other technology company for that matter, should use MySQL, NoSQL, z/OS mainframe or paper and pencil for keeping their data is an engineering decision and a business decision. And a team of people that would be qualified to make that call should consist of businessmen and engineers by trade, not computer scientists. There are many considerations that Stonebraker is not even aware of. Being highly intelligent and knowledgeable he, nevertheless is not an engineer, nor is he privy to the internal business information about Facebook. He simply does not have enough information that would allow him to make that call.
    Facebook is the new Google. Out of 10 engineers in the Silicon Valley, at least 9 would drop whatever it is that they are doing and go work for Facebook, if given the chance. Consequently, some of the best engineering talent is already employed by Facebook, and they likely have a good reason to be doing the things the way they do it. Engineering is ultimately a practical discipline.
    Most relational databases, save for a few column stores, store the data much the same way IBM System R did back in early 1970s. This approach to storing data works, and its practical. The computer science part here is ancient history at best. It’s great from the engineering standpoint, because it’s tried and true and it works. Done is better than perfect.

    Share
  110. There is no where out but to continue with mysql, too painful to bite the bullet at this stage to replace the mysql. Introduce more server may be the choice out as for now.

    Share
  111. Vadym Kurylovych Friday, July 22, 2011

    Too many trolls in comments. Every php-coder thinks he is mysql guru. Writing sites for 10 hits/day a different process instead of writing sites for 10k hits/second. Come back to mommy, lamers.

    Share
  112. Walker Hamilton Thursday, July 28, 2011

    bullshit. he’s got too much at stake to even listen to him on this front.

    Share
  113. Right now I’m using NoSQL (Redis in this case) to help MySQL. When there are data spikes we save the data in memory with Redis so we don’t lose it because of slow MySQL.

    Share
  114. They might very well be having problems internally but from a user’s point of view, Facebook is running smoothly. I personally have not noticed any significant delays. I’m not aware of anyone leaving Facebook over performance issues, so I don’t think they need a complete redesign.
    I think they are nearing their peak data usage. Proliferation among people under 30 is very high around the world. I’ve noticed that now people in their 40s and 50s are joining in, but they are not as active and thus do not generate a lot of demand. If they keep adding servers they’ll be OK for the next few years.

    Share
  115. You have noticed no problems w/FB? This morning I logged on. In my 10 minutes there I had 7 “can’t write to database ” errors.

    Share
  116. “If /dev/null > is fast and web scale, I will use it. Is it webscale? MongoDB is webscale. ” -from http://bit.ly/qMZnFl
    This sums up 80% of what I read here between the people who have little or no idea of the technology barriers demolished by Stonebraker and the disagreements about which company is using which technology. Most if not all of these top 0.001% companies are using multiple technologies depending on the problem space they occupy. NoSQL is fine when if fits. So is Hadoop, Oracle, Mongo etc. WHen you have ONE hammer everything looks like a nail. When you have a tackhammer,sledgehammer,screwdriver,clawhammer,rubber hammer you pick the one that fits tacks,spikes,screws,nails,wtc. I expect in this interview as in most, quotes were taken out of context, by interviewers who don’t understand the technology very well. Take it all with a grain of salt. -sign me an Oracle dba with time on Ingres, Oracle, dBase 3, MS Access, Informix/Illustra, Postgres, PostgreSQL, DB2, Sybase, UDB, StreamBase, NoSQL, MS SQLServer, and some homegrown stuff.

    Share
  117. So… let me get this right? Facebook is in “MySQL Hell”.

    -They have one of, it not the most visited sites on the net.
    -Their site have next to no lag.
    -And almost every page on… the social network… requires how many db calls?

    I’m trying to figure out what people actually expected to happen here because it’s kind of confusing. Google runs search queries… not impressive. Amazon runs prices and publishers… okay. Facebook runs status updates, games, likes, groups, fan pages, applications, and hosts their on API to manage anyone’s account and pages remotely.

    4,000 Shards…. even if it was a baseless accusation the OP can’t back up… suddenly, doesn’t feel that impressive or bothersome.

    Share
  118. Facebook trapped in MySQL ‘fate worse than death’ http://t.co/6sPxwnyX

    Share
  119. Facebook trapped in MySQL ‘fate worse than death’ http://t.co/xM1KgBjK

    Share
  120. Facebook trapped in MySQL ‘fate worse than death’ http://t.co/2RU7QjUa

    Share
  121. @perbone Durma com esse barulho, então: http://t.co/RTPBQA3y =P

    Share
  122. Outdated “@edsonmarquezani: @perbone Durma com esse barulho, então: http://t.co/XQ8EsQCp =P”

    Share

Comments have been disabled for this post