RealWorld NoSQL: Cassandra at Openwave

Edit Note: This is the third of a multi-part series of posts exploring the use cases for NoSQL deployments in the real world. So far, the series has covered case studies on MongoDB and Hbase.

With all the excitement surrounding the relatively recent wave of non-relational – otherwise known as “NoSQL” – databases, it can be hard to separate the hype from the reality. There’s a lot of talk, but how much NoSQL action is there in the real world? In this series, we’ll take a look at some real-world NoSQL deployments.

Openwave (s opwv) delivers mediation and messaging products for ISPs and telecom service providers. It has adopted Cassandra as the basis for its next generation messaging platform, which will go live in the second half of 2011. Like HBase, Cassandra is heavily influenced by the Google (s goog) BigTable model, but also uses concepts from Amazon’s (s amzn) Dynamo distributed key-value store. Cassandra was first developed by Facebook, and has since seen action at Cloudkick (recently acquired by Rackspace (s rax)), Digg and Twitter.

Openwave’s next generation platform must support geographic redundancy, massive scalability and high availability. It needed to be able to distribute databases redundantly across multiple data centers and handle large customer datasets – varying from hundreds of terabytes to petabytes, and supporting thousands of transactions per second from each customer.

Openwave settled on Cassandra after evaluating other non-relational databases including CouchDB and Voldemort. Cassandra was selected because of its multi-data center scalability and reliability.

Moving from the relational world to Cassandra required unlearning a lot of traditional techniques, particularly with respect to data modelling. In Cassandra, the data modeling is determined by the nature of the application queries rather than the nature of the data. Cassandra “Super Columns” essentially determine which child records will be accessible from a master record. Selecting the correct Super Column structure will therefore determine which queries can be supported. Furthermore, while relational data modeling starts with the elimination of redundant data, in Cassandra one would normally create multiple “Column Families” – roughly equivalent to an RDBMS table – to support the various questions that the application might ask. For instance, one Column Family might store sales grouped by customer, while another might store sales grouped by product.

“It was tempting to apply relational techniques that we know and love to Cassandra,” said Utpal Thakrar, product manager at Openwave. “It simply does not work. Performance gets impacted dramatically if the data model is ill-designed.”

Like most of the non-relational database alternatives, Cassandra does not support the strong multi-object transactions found in the relational world. Cassandra supports “tuneable consistency” though, which allows the application to trade off between speed and consistency. For instance, an operation might choose the strictest consistency level, ensuring that all users of the database see the changes immediately. Another operation may choose a lower level of consistency, achieving higher throughput but permitting temporary inconsistencies as the changes are propagated through the system. “Applications that require transactional support have to be redesigned to play within the limitations of tunable consistency that Cassandra offers”, said Thakrar, “When in Rome, do as Cassandra does’ was the motto we had to preach throughout the organization.”

Moving from RDBMS to Cassandra has not been a trivial, and commercial support from Cassandra vendor Riptano (now DataStax) has been critical,” said Thakrar. “We have certainly run into issues, but are resolving them quickly with Riptano’s help.”

To learn more about the factors driving big data and optimal strategies for solving it, including from Hadoop, NoSQL and MPP database leaders, come to our Big Data conference held on March 23 in NYC.

Guy Harrison is a director of research and development at Quest Software, and has over 20 years of experience in database design, development, administration, and optimization. He can be found on the internet at, on e-mail at [email protected] and is @guyharrison on twitter.

Related content from GigaOM Pro (sub req’d):