12 Comments

Summary:

At last week’s MongoSV conference in Santa Clara, Calif., a number of users shared their experiences with the MongoDB NoSQL database. One common theme: NoSQL is necessary for a lot of use cases, but it’s not for companies afraid of hard work.

wordnik architecture

MongoDB might be a popular choice in NoSQL databases, but it’s not perfect — at least out of the box. At last week’s MongoSV conference in Santa Clara, Calif., a number of users, including from Disney, Foursquare and Wordnik, shared their experiences with the product. The common theme: NoSQL is necessary for a lot of use cases, but it’s not for companies afraid of hard work.

If you’re in the cloud, avoid the disk

According to Wordnik technical co-founder and vice president of engineering Tony Tam, unless you’re willing to spend beaucoup dollars on buying and operating physical infrastructure, cloud computing is probably necessary to match the scalability of NoSQL databases.

As he explained, Wordnik actually launched on Amazon Web Services and used MySQL, but the database hit a wall at around a billion records, he said. So, Wordnik switched to MongoDB, which solved the scaling problem but caused its own disk I/O problems that resulted in a major performance slowdown. So, Wordnik ported everything back onto some big physical servers, which drastically improved performance.

And then came the scalability problem again, only this time it was in terms of infrastructure. So, it was back to the cloud. But this time, Wordnik got smart and tuned the application to account for the strengths and weaknesses of MongoDB (“Your app should be smarter than your database,” he says), and MongoDB to account for the strengths and weaknesses of the cloud.

Among his observations was that in the cloud, virtual disks have virtual performance, “meaning it’s not really there.” Luckily, he said, you can design to take advantage of virtual RAM. It will fill up fast if you let it, though, and there’s trouble brewing if requests start hitting the disk. “If you hit indexes on disk,” he warned, “mute your pager.”

Foursquare’s Cooper Bethea echoed much of Tam’s sentiment, noting that “for us, paging the disk is really bad.” Because Foursquare works its servers so hard, he said, high latency and error counts start occurring as soon as the disk is invoked. Foursquare does use disk in the form of Amazon Elastic Block Storage, but it’s only for backup.

EBS also brings along issues of its own. At least once a day, Bethea said, queued reads and writes to EBS start backing up excessively, and the only solution is to “kill it with fire.” What that means changes depending on the problem, but it generally means stopping the MongoDB process and rebuilding the affected replica set from scratch.

Monitor everything

Curt Stevens of the Disney Interactive Media Group explained how his team monitors the large MongoDB deployment that underpins Disney’s online games. MongoDB actually has its own tool called the Mongo Monitoring System that Stevens said he swears by, but it isn’t always enough. It shows traffic and performance patterns over time, which is helpful, but only the starting point.

Once a problem is discovered, “it’s like CSI on your data” to figure out what the underlying problem is. Sometimes, an instance just needs to be sharded, he explained. Other times, the code could be buggy. One time, Stevens added, they found out a poor-performing app didn’t have database issues at all, but was actually split across two data centers that were experiencing WAN issues.

Oh, and just monitoring everything isn’t enough when you’re talking about a large-scale system, Stevens said. You have to have alerts in place to tell you when something’s wrong, and you have to monitor the monitors. If MMS or any other monitoring tools go down, you might think everything is just fine while the kids trying to have a magical Disney experience online are paying the price.

By the numbers

If you’re wondering what kind of performance and scalability requirements forced these companies to MongoDB, and then to customize it so heavily, here are some statistics:

  • Foursquare: 15 million users; 8 production MongoDB clusters; 8 shards of user data; 12 shards of check-in data; ~250 updates per second on user database, with maximum output of 46 MBps; ~80 check-ins per second on check-in database, with maximum output of 45 MBps; up to 2,500 HTTP queries per second.
  • Wordnik: Tens of billions of documents with more always being added; more than 20 million REST API calls per day; mapping layer supports 35,000 records per second.
  • Disney: More than 1,400 MongoDB instances (although “your eyes start watering after 30,” Stevens said); adding new instances every day, via a custom-built self-service portal, to test, stage and host new games.

For more-technical details about their trials and tribulations with MongoDB, all three presentations are available online, along with the rest of the conference’s talks.

Feature image courtesy of Tony Tam, Wordnik.

  1. These are excellent arguments for why you should pay someone else to host and manage your database :)

    Share
    1. Is Mongo the on NoSQL Database? I didn’t see you mention any others…

      Share
      1. Did you mean to reply to me? I didn’t write this article… but you’re absolutely right. The article title is way off…

        Share
  2. This article is about MongDB, not NoSQL. Pretty poor reporting all around in my opinion. Why not take the next step and associate MongoDB with all software.

    Share
  3. Yeah, I didn’t mention any others, but I think the same (or similar) lessons apply equally elsewhere — Cassandra, CouchDB, etc. I just focused on Mongo because of the event, but perhaps should have pointed to coverage of other NoSQL projects (which I do below). Users tend to be really happy with their choices, but the non-commercial versions, in particular, do take some work to optimize for any given workload. Just like any other open source product.

    http://blog.mudynamics.com/2011/09/01/blitz-io-path-finding-with-couchdb/

    http://gigaom.com/2010/09/08/digg-not-likely-to-give-up-on-cassandra/

    Share
    1. @Derrick Harris: Nice article about real-world experiences, thanks. I had no difficulty distinguishing between NoSQL and MongoDB references in the article.

      Share
  4. Vladimir Rodionov Sunday, December 18, 2011

    MongoDB – NoSQl DB because it does not support SQL, may be?

    Share
    1. No SQL = Not Only SQL

      Share
  5. Actually I think it’s pretty damn amazing we’re running these apps practically all from RAM. Makes me wonder what sort of innovation’s gonna be needed to keep up with the growing demand.

    Share
  6. James Pettyjohn Monday, December 19, 2011

    Good article, I like the real cases. I’d argue though that if you have more than a few million records or incur lots of traffic, take any data storage of any kind and you will need someone who knows what they’re doing.

    Not Oracle, DB2, MSSQL, MySql, cassandra, HBase or BerklyDB will run well without someone knowing the system and doing the work to make it run well and innovate for the use case at hand. Find one website about Oracle that doesn’t sell you Oracle professional support – and this is not because it’s a bad product, but not one is a golden hammer.

    Finding the right tool for your use case makes can make a huge difference, but it will only increase/decrease the amount of your own creation required to make the system work for you.

    Share
    1. Any of these problems really MongoDB or NoSQL specific? All types of databases suffer from these kinds of limits.

      Share
  7. Good article. I come from a background where I had to deal with these sorts of scale well before the modern NoSQL offerings existed. Any mission critical system at massive scale, like the ones discussed here, need to have rock solid monitoring and alerting and the smallest of design decisions can have huge echoes when things scale up regardless of your tech stack. Sure there is work to be done when using MongoDB but its a heck of a lot better than the alternative!

    Share

Comments have been disabled for this post