Weekly Update

Updated: AWS DynamoDB and the eventual consistency issue

Amazon’s new NoSQL database service DynamoDB launched this week, not with a bang but a whimper. Minutes into the live stream announcing the service, the video link went down, which was a bummer for the hundreds of people who tuned in and kicked off lots of jokes on Twitter about cloud database services and the notion of “eventual consistency” being a synonym for “inconsistency.” DynamoDB is a NoSQL database service that will run in the AWS cloud, but in the world of cloud computing and the Internet at large, predictability and consistency are often a stretch. What does this mean for developers using the service?

Behind the jokes is an interesting and complex issue that companies using cloud services will need to embrace. We have come to expect that modern distributed systems supporting large web applications must provide low read and write latency. Think of entering your info when purchasing something on the web: One or two seconds too long and you’re out and onto the next website. To achieve this low latency, cloud systems often eschew protocols that guarantee consistency and instead opt for eventual consistency protocols. This means there is no guarantee on the recency of the version of data you are seeing except that the system will “eventually” return the most recent version in the absence of new writes. But how “eventual” is eventual consistency?

Updated: In terms of big data infrastructure, which is the use case for Amazon DynamoDB, eventual consistency is probably fine, as any one piece of data in a store of terabytes or petabytes is not useful by itself. One log file will not change the outcome of your analysis. For real-time analytics applications running atop this infrastructure, however, eventual consistency doesn’t work. Think about banking transactions, for example, or any application where the data must be accurate at all times. It will be interesting to see how developers get their heads around eventual consistency and the use cases for a service like DynamoDB. Eventual consistency is the default setting on the service, but Amazon does let users switch to a strongly consistent read setting if their applications requires it.

Another thing developers might want to pay close attention to is the pricing structure for DynamoDB. Storage is $1 per GB per month, which makes you wonder whether it would be cheaper to build in-house as opposed to using Amazon and whether DynamoDB is really suited to big data applications. Requests are priced based on how much capacity is reserved: $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity. A unit of read (or write) capacity equals one read (or write) per second of capacity for items up to 1 KB in size. If you use eventually consistent reads, you can achieve twice as many reads per second for a given amount of read capacity. Larger items will require additional throughput capacity. That’s supposed to be simple! Maybe it is for math nerds, but I had to read it a couple of times just to get it straight. There is also no query language, so users can’t do anything interesting with the data other than store and retrieve it.

The upside of a service like DynamoDB is ease of use. Once NoSQL clusters start growing, managing them is just as horrible and cumbersome as managing large RDBMS installations. And there is also a shortage of technical people skilled at doing this. Operational complexity is a huge barrier to adoption of NoSQL databases, overshadowing performance, reliability and scale, according to Amazon. As a fully managed NoSQL database service, DynamoDB takes all the operational headaches away from developers. That’s definitely a plus.

There will be plenty of developers that steam in and use Amazon DynamoDB without reading the manual. Find out what applications it is good for before you start building. Remember what happened when the Amazon EBS outage knocked out a bunch of high-profile sites and everyone started yelling and screaming about how the cloud didn’t work and was doomed? Amazon turned around and said, “We told you so” and pointed to a white paper about how to ensure redundancy on AWS. Let’s not make the same mistakes with DynamoDB. Follow the instructions, people.

Question of the week

What are the pros and cons of Amazon DynamoDB?