28 Comments

Summary:

Several weeks ago, we saw a burst of news around memcached, an increasingly popular open-source caching software framework gaining attention from web companies and investors. Gear6 announced details of a new memcached-based product, and Schooner Information Technologies launched a set of memory-dense appliances, one targeted to […]

mysqlSeveral weeks ago, we saw a burst of news around memcached, an increasingly popular open-source caching software framework gaining attention from web companies and investors. Gear6 announced details of a new memcached-based product, and Schooner Information Technologies launched a set of memory-dense appliances, one targeted to MySQL, one to memcached. These announcements coincided with the MySQL Conference, as some see MySQL as the killer application for memcached, or perhaps vice versa. Other companies coming out of the woodwork around memcached include NorthScale, which has released no news as of yet except a shingle-sized web site introducing its capabilities.

In its most basic form, memcached helps deter application requests from reaching the database by storing previous requests in memory, or cache. But if more memory and, in many cases, memcached are such a benefit to MySQL, what does that say about MySQL to begin with? Is the necessary addition of a caching tier to scale performance positive or negative for the database?

Memcached is a tool to reduce the database load, extending the life of a single database server, and relieving pressure to scale the database across many machines. As the free and popular relational database of choice for web companies, MySQL is synonymous with relational database in Internet infrastructure.

Without a doubt, MySQL has been a cornerstone of web infrastructure for years. But the scalability challenges are well-known, and as the Internet has morphed from millions to billions of records, many believe the heavyweight nature of an RDBMS approach, and the constraints it imposes, might well be replaced with a number of more lightweight alternatives. Indications of this are Drizzle, a fork of MySQL intended to be a “a lean, mean query-running machine,” according to the wiki. Karen Tegan Padir, VP at MySQL, recently touted “Drizzle Day” as a conference highlight in an interview with OStatic.

Other methods taking this lean mindset even further include a range of distributed key-value stores. The list is long and includes names like CouchDB, Hypertable, HBASE, Tokyo Cabinet, LightCloud and Cassandra, which we discussed here. These projects are concisely summarized here and here. In addition, LinkedIn has made most of the information on Project Voldermort, the company’s take on a key-value store, available online.

This trend away from the overhead of a relational database, which stressed completeness, to the emerging distributed key-value stores that stress scale, represents a shift in designing web architectures. Many applications relying on MySQL or other relational database implementations in the web world do not need the full set of relational capabilities inherent to those packages. Instead, a lighter-weight, more streamlined approach may prove to be a more relevant data format.

MySQL and relational databases are not going away, nor can their functionality be replaced one for one with the key-value stores mentioned earlier. But the rush of memcached news surrounding the MySQL conference, with memcached positioned as a way to improve MySQL performance, begs the question of the root cause to scale and performance, and how long that can continue.


ms_garyorenstein_018_72dpiGary Orenstein is the author of “IP Storage Networking: Straight to the Core”, host of TheCloudComputingShow.com and was a co-founder of Nishan Systems (now part of Brocade).

  1. eBay uses Oracle… has 5PB of data across various instances… they cache heavily. What does that say about Oracle?

    I don’t understand why MySQL performance is being implicated when discussing best-practices for developing planet-scale systems.

    On the positive side, you have the trends right. “Flatter” data models with BASE mentalities over ACID approaches are more viable today for successful web companies. These approaches work great for “mostly-read, some-write” use-cases which a lot of social networks etc. have. They also work when you can afford to have eventual consistency.

    But it is not so great for write-intensive sensor-based platforms, for example. In such scenarios, other platforms like Scalaris are coming into the picture. These platforms place near-immediate consistency at a higher priority.

    Cheers,

    Zubin
    http://zwadia.com/

    Share
  2. MySQL performance is just fine, when the database (and server speed) is consistent and solid.

    Share
  3. Gary Orenstein Sunday, May 17, 2009

    Zubin, Thanks for your comments. Any time you are willing to break things up into various instantces, that will temper the difficulties of scale. The trade-off then is more management oversight.

    MySQL is implicated only because is is the defacto RDBMS system in web infrastructure. The same could be said about many others. My intent was to raise the discussion about RDBMS compared to flatter models, using general web workloads, which tend to be overwhelmingly read-centric, as an example. This is far different than transactional systems that might need to rely on Oracle, MySQL or similar. That market is not going away.

    Share
  4. If your intention was to “raise the discussion about RDBMS compared to flatter models” then perhaps you should have actually addressed that in your piece. As it stands this seems like an extremely uneducated hit piece on MySQL.

    Stating, as you do, that the use of memcached as a caching tactic in MySQL implementations is somehow a inherent weakness of MySQL makes about as much sense as saying that the widespread use of adjustable wrenches is an inherent weakness of nuts and bolts.

    Share
    1. The relationship between technologies like memcached and relational databases did come across as a little confused, since they serve different purposes and are frequently used together. MySQL or relational models have little to do with it. No database engine will scale well if you insist on (pointlessly) running every transient bit of state through the transaction logs.

      Share
    2. Gary Orenstein Sunday, May 17, 2009

      Jeffrey, This was never intended as a hit piece. It appears that there has been an unusual amout of excitement around the memcached/mysql combo. The discussion is to the caching/relational db mix, and where and why that is used in web applications. The best examples are the ones used. Andrew’s longer comment a few down thoughtfully extends that theme.

      Share
  5. Ummmm… “the root cause to [MySQL's] scale and performance” is having to write data to disk. MySQL and memcached are orthogonal.

    Share
  6. There are a couple different issues here.

    First, the ubiquity of MySQL is orthogonal to its ability to scale as a relational platform. For many workloads, just about any other relational platform will run circles around it and it therefore sits in the uncomfortable position where the kinds of workloads that do scale well on MySQL also tend to be easily implemented on non-relational data stores. MySQL in web apps has traditionally often been little more than a glorified data store. If you really need relational features you go with something like PostgreSQL or Oracle, and if you do not need serious relational features then you could implement your application using extremely scalable non-relational data stores.

    Second, the kinds of applications you can build are dependent on the practical capabilities of the database engine being used and so the applications being built may be a reflection of the limitations of existing technology. Flatter key-value data stores are used because that is a technology that can scale for web apps, but using that technology significantly constrains the applications that can be built on top of it. To give an example at the opposite end of the spectrum, Semantic Web technologies are based on hyper-relational data models that scale badly on conventional relational platforms because relational databases are not relational *enough* below the interface level. Applications that depend heavily on fast analytics have this problem generally.

    There is nothing wrong with the relational model but the implementations are badly positioned. At one extreme you have flat key-value models scaling web apps in the simple case and at the other extreme of complex relational analytics the existing relational implementations are horribly inadequate. The latter extreme is more technologically interesting if only because there is no one seriously addressing that market.

    Share
    1. Gary Orenstein Sunday, May 17, 2009

      Andrew, An excellent addition to this discussion. Thanks.

      Share
    2. As has been stated by previous commenters above, which I’ll reiterate here, the workloads of social networking sites fall mostly into the ‘read lots, write once’ class (most of the web exists within that paradigm.) Regardless of the database company that’s responsible for the software, the main idea in scaling this read heavy workload is to remove the burden from the database and move it to distributed memory stores.

      As an engineer, you want applications to pull from the same cache pool to reduce I/O pressure. To ensure that every machine isn’t replicating data in individual caches, you have to go distributed. That’s the win with memcached.

      Putting a distributed cache between the application and the database increases performance and shares data across your application servers, something that the database cannot do on it’s own. The database has on-disk and in memory caching, but eventually you’ll run out of memory on a single host if your working set exceeds the host’s memory.

      Memcached also covers up replication lag (MySQL is terrible at replication, Oracle not so much) in large environments by putting data into the distributed cache (Write-through caching) before the slave database has finished it’s writing. Data is available immediately to clients, before the replication has completed.

      It will also provide a large amount of savings when you’re constantly executing that O(n x m) query to find out who is friends with whom on your social networking site.

      This comes with a cost, though. Relational database functions, like joining across large data sets, and atomic operations, become very difficult to execute. Memcached becomes the central server, and there is always a fear that an important key will drop out of cache because of a random eviction.

      It’s not without risk, either. Dependence on the cache can hurt you severely if lots of memcached servers fail (and they do fail), Leaving you in a ‘cold cache’ situation where it can take hours to repopulate your working set back into the cache pool.

      Don’t question MySQL’s performance — relational databases are great, but they are not the only solution to storage problems. the two problems that are being solved here are, highly orthogonal.

      I’d also like to state that the majority of alternate key-value store databases listed in Richard Jones’ article are really not ready for high production loads (with maybe the exception of Tokyo Cabinet, HDFS, and Cassandra). There is still a ton of ‘secret sauce’ the large sites are keeping quiet about in order to make these into effective data stores. Tread lightly.

      Share
      1. Gary Orenstein Sunday, May 17, 2009

        Netik, Thanks for accurately outlining some of the technical details. While I agree that memcached/mysql (or caching/relational db) problems and solutions might be othogonal from a technical standpoint, the commonality of the deployments lead me to see a closer relationship.

        Share
  7. I really need to figure out how to use memcached with my wordpress/vbulletin stuff on my server. I think it’ll help the server load during peak time

    Share
  8. [...] posted this in response to a post on GigaOM, but it was such a long comment, I felt that it was worthy as a post on it’s [...]

    Share
  9. I am afraid that the technical naivete of this article is startling given that it appears on GigaOm.

    Memcached is to MySQL what a website might be to a blog authoring tool. You edit in one and publish from the other. Each is optimized for a different problem, and yet they are used together.

    Share
  10. I agree MySQL is stready as a rock when database speed and server work just fine.

    Share

Comments have been disabled for this post