16 Comments

Summary:

Digg, the San Francisco-based social media company, is dropping MySQL and instead betting its future on Cassandra, an open-source data store. It’s just the latest sign of the growing popularity of the software, which was developed (and open sourced) by Facebook to search through its inbox.

Digg, the San Francisco-based social media company, is dropping MySQL and instead betting its future on Cassandra, an open-source data store. It’s just the latest sign of the growing popularity of the software, which was developed (and open sourced) by Facebook to search through its inbox. While Facebook has since backed off Cassandra, Digg plans to open source all its work on Cassandra and champion the software’s development and adoption.

In a blog post on the Digg blog, John Quinn, Digg’s VP of engineering, writes:

Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who’s been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.

What’s Wrong with MySQL?

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.

Digg is just the latest high-profile convert to the NoSQL world. Instead of using databases such as MySQL, many of the companies that deal in near-real-time information are opting for new kind of data stores — most of them open source, such as Cassandra and CouchDB.

Cassandra is roughly the open-source equivalent of Google’s Big Table. It was intended by Facebook to solve the problem of inbox search; the company needed something that was fast, reliable and had the ability to handle read and write requests at the same time. Messaging in an environment as heavily used as Facebook requires a system that can not only store data but also provide results for search queries at blazing fast speeds.

Stu Hood, the technical lead for the search team in the Email & Apps division of Rackspace, recently said:

I think that distributed databases solve a problem that a lot of companies with large datasets have had to solve independently in the past…Cassandra has an approach that hybridizes the Bigtable and Dynamo models, where a lot of its competitors chose to take one path or the other. Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure (possible because of the eventually consistent approach). When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single “row” to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values.

In a post last year, contributing writer Gary Orenstein pointed out that thanks to these attributes, Cassandra has potential applications beyond inbox search that include “recommendation engines, targeted advertising, and content search, particularly when you combine many concurrent inputs and output requests to the same data set.”

Digg is a prototypical application. The company tells me that it gets:

  • 40 million visitors a month, who in turn account for roughly 500 million page views a month.
  • 20,000 daily submissions

It also generates:

  • 170,000 daily Diggs
  • 19,000 comments

As these numbers suggest, there is a high amount of interaction between the system and its users. No wonder Digg digs Cassandra!

Related content from GigaOM Pro (sub req’d):

What Cloud Computing Can Learn From NoSQL.

By Om Malik

You're subscribed! If you like, you can update your settings

Related stories

  1. “Cassandra has the advantage of a more advanced datamodel, allowing for a single “row” to contain billions of column/value pairs: enough to fill a machine.” – I dont’ agree with this. Last time when I tested(0.5 version) it couldn’t even handle gets of 500 columns properly. It is all hype I think.

    Share
  2. [...] Weiter bei GigaOM [...]

    Share
  3. Facebook backed out? What do you mean? The project is under incubation with apache: http://incubator.apache.org/cassandra/

    Share
  4. [...] State of Google Apps See All Articles » Why Digg [...]

    Share
  5. [...] Why Digg Digs Cassandra See All Articles » NorthScale, a Memcached-focused Startup Launches [...]

    Share
  6. [...] Malik quoted extensively from the Digg announcement and from Rackspace engineer Stu Hood, who explained Cassandra’s [...]

    Share
  7. [...] component of building out a web-based business, much like Facebook’s Cassandra project has swept through the ranks of webscale startups and even big [...]

    Share
  8. [...] day. So now I’m back to wondering if FlockDB and Gizzard will join the ranks of  Hadoop or Cassandra as open-source solutions for managing data at [...]

    Share
  9. [...] startups and whole branches of code designed to help sites scale their data, from Hadoop to Cassandra to Twitter’s Gizzard. Mikesell said the product could replace the need for caching appliances [...]

    Share
  10. [...] or Digg deciding to use Cassandra, or LinkedIn using Voldemort as a key-value store, are the equivalent of pioneers traveling along [...]

    Share
  11. [...] Pfeil pointed out that enterprise users are struggling with an overload of data much like web-scale startups like Digg and Twitter are, and that some enterprise users are considering Cassandra in addition to their [...]

    Share
  12. [...] startups and whole branches of code designed to help sites scale their data, from Hadoop to Cassandra to Twitter’s Gizzard. Mikesell said the product could replace the need for caching appliances [...]

    Share
  13. [...] project, a next-generation database of the NoSQL variety and the engine powering the massive data needs of Twitter and Digg. These database technologies are the future of the webscale business — the next generation of the [...]

    Share
  14. [...] at Digg, which led to the yet-unconfirmed departure of Digg VP of Engineering John Quinn. Hewas a big champion of Cassandra at [...]

    Share
  15. I must admit it has been pure heaven working with Cassandra on our project..it takes like 10 minutes to get up and running and clustered!

    Share

Comments have been disabled for this post