Weekly Update

NoSQL and SQL: learning to play nice

Last week, Microsoft announced, and released to preview, its very own document store NoSQL database for its Azure cloud platform. The company also announced a new search service (also in preview) and, at long last, the general availability HBase, the Hadoop-based column family NoSQL database, as part of its HDInsight cloud Hadoop offering.

Redmond does documents
The release of Azure DocumentDB (as the document store offering is branded) is probably the most notable of the three announcements, and for a few reasons. First, I had been hearing rumors for at least two years that this product would be forthcoming, and was glad to hear that it had made it out the door. Second, it’s interesting to see that the company that brings you the product called “SQL Server” is seeing benefit in NoSQL stores. In fact, if you include Azure Storage tables (a key-value NoSQL store) and the aforementioned HBase on HDInsight, Microsoft now has three separate NoSQL offerings.

But the really notable thing about DocumentDB is how it combines features associated with conventional, relational databases and those typical of document store NoSQL products.

DocumentDB is premised on storing multi-structured data, formatted as JavaScript Object Notation (JSON) objects. JSON format allows for data that is hierarchical, and whose structure may differ between documents (akin to rows) in a collection (akin to a table). DocumentDB also allows for server-side stored procedures, written in JavaScript, to be stored in the database and executed over data within it.

The features I’ve just discussed are also to be found in other document stores, including MongoDB, Apache CouchDB and CouchBase. They are standard…even assumed. But what makes DocumentDB different is that it takes that feature set and adds features typically associated with conventional relational databases: a SQL dialect for querying and adjustable levels of database “consistency.”

Flexible consistency
While seemingly a dry topic, the latter is quite significant, and indicative of an emerging trend in the market. Without going into a crazily technical description of database consistency, consider the two typical extremes. Virtually all relational, transactional databases enforce rigid consistency of the data, including in clustered scenarios: no one sees a change to the data until everyone, on every node, does.  So updates aren’t complete until they are fully propagated.  That’s safe, but can be slow.

Many NoSQL databases, on the other hand, use a model called “eventual consistency,” whereby the operation of writing or updating data may be considered complete even before the change has fully propagated across all nodes. “Eventually” all other nodes will catch up though. A great example of eventual consistency is the domain name server (DNS) system on the Internet: when a domain is updated to a new numeric IP address (like 205.232.3.7) not all Internet users see that change right away, even while others do. But eventually all DNS servers will reflect the change.

So what if you want to mix and match? What if you want a database that handles multi-structured data but also offers full database consistency? What if you want full consistency on some data collections but not on others? DocumentDB takes the bold step of allowing developers to choose – between not just those two, but a total of four consistency models. The developer is thus given the power to tradeoff between high consistency and low latency.

So, even though the company that makes SQL Server has seen fit to offer a kind of “NoSQL Server,” it’s also blurring the lines between the two approaches to database management, which may one day lead the software behemoth to accommodate both models in the same product. This would break down newly emerging data siloes and bring conventional databases like SQL Server into the modern era.

SQL for machine data
Another recent release that bears mention in this discussion is a new machine data platform, called X15. While functionally, X15 provides similar functionality to other machine data-specific platforms, like Splunk, it does so with a novel architecture.

First off, X15 implements its own indexing engine and massively parallel processing (MPP) database engine over the Hadoop Distributed File System (HDFS); second, it provides a SQL-based query interface to the (multi-structured) log data that it manages; and third, it has extended the SQL grammar to allow for search-based query techniques to be comingled with standard SQL constructs. This contrasts with Microsoft’s preview offering of a standalone search service which is aimed more at site search than it is a search interface for databases.

As with Azure DocumentDB, X15 decided to standardize on the SQL query language and adapt it for NoSQL querying. It also adapted SQL to take on full-text search expressions. In so doing it has decided to mesh old standards with new technology, to be used in new workloads.

In the end, that’s exactly what the market needs if those new technologies are to be widely adopted in the Enterprise.