Weekly Update

With Scalable Data Stores Around, Is NoSQL a Non-Starter?

Let’s talk NoSQL. Specifically, let’s talk about NoSQL and its actual prospects for long-term success. Perhaps I’m incorrect, but the discussion seems to have evolved from one about abolishing SQL databases to one about coexisting with SQL databases, and then to one where SQL is actually regaining the momentum. Is SQL regaining favor, even among webscale types? Was it ever out of favor?

We saw evidence of this momentum shift back to SQL-based databases this week with Facebook’s Jonathan Heiliger signing onto the advisory board of clustered SQL startup Clustrix. Facebook famously invented the NoSQL Cassandra database but still relies on the venerable MySQL-plus-memcached combination for the brunt of its critical operations. Xeround (don’t get me started on why it abandoned its data virtualization strategy) now offers a scalable MySQL database on Amazon EC2. Database guru Michael Stonebraker recently launched his latest startup, VoltDB, which describes its namesake product as a “next-generation SQL RDBMS with ACID for scaling OLTP applications.”

At least when it comes to mission-critical applications, where querying is at least desirable, will a scalable SQL option always win out against a NoSQL option? Even for unstructured data?

Perhaps not coincidentally, the NoSQL database that seems to be experiencing the most success (by a traditional definition, at least) is Membase Server. Its eponymous proprietor (nee NorthScale) claims several high-profile customers including, as of this week, AOL and ShareThis. It’s even available as a database option on the RightScale, Heroku and Windows Azure cloud platforms.

There are multiple possible reasons for this (including that Membase operates as a classic startup vendor rather than as an open source project). The biggest, however, might be that the key-value Membase Server is based upon the key-value memcached caching tool, which, as noted above, is a popular add-on to MySQL for scaling relational databases. Membase isn’t SQL, but it does share a connection.

And once we’re no longer talking about serving data, but rather just about storing large volumes of it, NoSQL can seem nearly obsolete. For organizations willing to pay for data warehousing and analysis tools, the options are limitless, including massively parallel options like Aster Data’s nCluster or the new EMC Greenplum Data Computing Appliance. Have lots of unstructured data and want a free option? Try Hadoop. Just like NoSQL databases, it’s massively scalable (thanks to the Hadoop Distributed File System), only it also has a MapReduce-based analysis function. Further, most popular data warehousing products and BI tools have Hadoop integration to some degree, and Quest has developed a connector for Oracle Database.

Speaking of distributed file systems, well, there are plenty of those to go around, such as the newly updated — and open source — Gluster Storage Platform.

None of this is to say that NoSQL databases aren’t quality options. They actually vary greatly in terms of ideal uses, and some are gaining quite a bit of popularity. Aside from Membase, projects like Cassandra, CouchDB, MongoDB and Riak are maturing fast and gaining in popularity. But they’ve also been the cause of some noteworthy outages as of late. Perhaps these are just growing pains, but try telling that to most CIOs.

It’s too early to count out NoSQL, but the various projects seem to be under pressure from all angles. It’s a case of familiar versus unfamiliar, and the voices backing a better version of the status quo are getting louder.

Related Research: NoSQL Databases – Providing Extreme Scale and Flexibility

Question of the week

Do you think NoSQL will ever catch on in mainstream businesses?