NoSQL startup DataStax officially entered the pantheon of Hadoop providers today, introducing its own distribution called Brisk. Brisk utilizes the open source NoSQL database Cassandra as a replacement for Apache’s Hadoop Distributed File System, as well as Cassandra’s built-in MapReduce engine for the computing component. It also uses Hive, a tool for letting users perform SQL-like queries atop the Hadoop clusters. It’s one of many big changes to the Hadoop market — both in the past few weeks and forthcoming — that will push Hadoop’s evolution as a technology.
I spoke briefly about it with DataStax Co-Founder and CEO Matthew Pfeil at our Structure: Big Data conference today (watch the Livestream here), and he described Brisk as ideal for situations where organizations want a dual-purpose product that can handle large amounts of real-time application data and batch analytic data — while letting the two interact at very low latency. As explained in DataStax’s official announcement of Brisk:
A key benefit of DataStax’ Brisk is the tight feedback loop it allows between real-time application and the analytics that follow. Traditionally, users would be forced to move data between systems via complex ETL processes, or perform both functions on the same system with the risk of one impacting the other.
DataStax is positioning Brisk for high-volume web sites, financial services, retail and high-volume event processing. The ability to update an application’s logic quickly to better serve users and make smart investment decisions presents a huge opportunity for businesses in these spaces, and thus for companies like DataStax.
As I noted last week, the original Apache Hadoop project, on which Cloudera is based, is likely feeling pressure to evolve at a pace faster than has been customary for the project. This pressure is coming from internal sources such as Yahoo, as well as from software vendors like DataStax, Appistry and Pervasive, all of which have developed alternatives for HDFS and/or Hadoop MapReduce with the goal of improving performance. IBM also has its own suite of Hadoop products, and the word from Structure: Big Data is that other big names (EMC, for one) and a few well-funded startups will be getting into the space very soon.