Real World NoSQL: Amazon SimpleDB at Netflix

Edit Note: This is the fourth of a multi-part series of posts exploring the use cases for NoSQL deployments in the real world. So far, the series has covered case studies on MongoDB, Cassandra and Hbase.

With all the excitement surrounding the relatively recent wave of non-relational – otherwise known as “NoSQL” – databases, it can be hard to separate the hype from the reality. There’s a lot of talk, but how much NoSQL action is there in the real world? In this series, we’ll take a look at some real-world NoSQL deployments.

Netflix (s nflx) provides rent-by-mail and streaming movies in the United States. The shift from mail-order to streaming video had fairly significant implications for Netflix’s application infrastructure. Netflix realized that it would need multiple geographically dispersed data centers and far more processing capacity. Rather than build these new data centers, Netflix decided to migrate its applications to Amazon’s (s amzn) AWS cloud. This allowed the company to concentrate its intellectual efforts on building customer value rather than nationwide data centers.

As a part of this bold move, Netflix migrated core parts of its database from Oracle (s orcl) to Amazon’s SimpleDB data store. This migration is one of the biggest migrations to the cloud yet undertaken, with the Netflix system serving the needs of more than 16 million subscribers and hosting over 100,000 DVD titles.

SimpleDB is a key-value store that runs within the Amazon Web Services (AWS) cloud, and promises reliable and transparently scalable storage together with a flexible schema that supports either immediate or eventual consistency. SimpleDB is a virtually zero-administration service; there is no database administration involved in scaling the system. Storage and computer power is assigned dynamically and automatically by Amazon as the database grows.

Netflix needed to make significant compromises in exchange for the scalability provided by Amazon AWS and SimpleDB. Complex SQL operations such as joins between tables or aggregate “group by” operations which would normally be executed within the database were moved to the application layer. In some cases, this required that the data model be de-normalized; data that would be stored in multiple tables in Oracle was flattened into a single SimpleDB structure so that joins could be avoided.

Relational database transactions were depreciated in favour of SimpleDB’s optimistic concurrency mechanism, which allows modifications to proceed only if an item is unchanged since it was last accessed. For instance, an attempt to increment a counter (number of rentals for a video for instance) would be rejected if the counter was simultaneously modified by another transaction. Even so application developers needed to be aware that certain operations (reading a value immediately after modifying it, for instance) might incorrect or at least unexpected results.

Netflix doesn’t use SimpleDB for all storage; Oracle, MySQL and the Amazon S3 service all form significant parts of the Netflix architecture. Nevertheless, with more than 16 million customers, Netflix has made a significant commitment to a non-relational alternative and one which, it says, allows the company to better meet customer and shareholder needs. Netflix has been generous in sharing its experiences in articles such as this one.

To learn more about the factors driving big data and optimal strategies for solving it, including from Hadoop, NoSQL and MPP database leaders, come to our Big Data conference held on March 23 in NYC.

Guy Harrison is a director of research and development at Quest Software, and has over 20 years of experience in database design, development, administration, and optimization. He can be found on the internet at, on e-mail at [email protected] and is @guyharrison on twitter.

Related content from GigaOM Pro (sub req’d):