Boston-based startup Hadapt is practicing its own form of polygamy — bringing as many piece of the data ecosystem as possible under its roof — and the resulting union looks pretty nice. In version 2.o of its Adaptive Analytical Platform, the company is expanding on its original premise of a unified architecture for Hadoop and SQL by adding advanced analytic functions and a tight integration with business-intelligence favorite Tableau Software.
The new capabilities are a big step for Hadapt, which launched less than two years ago at our inaugural Structure: Data conference, and foretell a coming trend in the Hadoop marketplace. In a nutshell, the new version ships with some standard analytical functions such as sentiment analysis and funnel analysis, as well as a development kit to help users write their own. And users can write the functions in standard SQL rather than trying to compose them in MapReduce or any other framework designed specifically to work with Hadoop. (For a more-technical explanation of Hadapt 2.0 and how it stacks up against traditional analytic databases, you might want to check out database industry analyst Curt Monash’s blog post on it.)
The real beauty, however, might be the integration with Tableau, which has become a darling of business analysts everywhere who appreciate its point-and-click analytic functions and beautiful visualizations. Hadapt CTO Philip Wickline told me during a recent call that Tableau is “overwhelmingly the choice” for BI within Hadapt customer accounts. I received a demo of Hadapt functions exposed via Tableau’s interface, and it was impressive to watch someone run and visualize an Apache Mahout-based sentiment analysis literally by making a few mouse clicks.
Why Hadoop and BI need each other
Hadapt isn’t perfect, and Tableau isn’t the be-all, end-all of analytics and visualization software, but this new feature set is a sign of how we should expect to see Hadoop evolve in the near future. Historically speaking, Hadoop is slow compared with the interactive query capabilities of relational analytic databases. It’s also not easy to use if you’re relegated to writing MapReduce jobs, and there’s no native capability for visualizing the results of those jobs.
On the other hand, specialized analytic tools such as Tableau and even machine-data master Splunk have been known to crumple under the weight of the massive data sets that Hadoop was designed to handle. Splunk has built Hadoop into its Splunk Enterprise product in order to enable easier processing of huge data sets. Furthermore, Hadoop is almost a necessity for adding structure to unstructured data and making it analyzable by BI tools and relational databases at all.
All this has some industry watchers wondering whether Hadoop is really the answer as big data users increasingly require capabilities such as real-time processing and rapid interaction with data. Startups such as Precog are trying to answer these concerns by building their own analytic tools from scratch that don’t rely on Hadoop at all to handle even rather large data sets. Google evolved has from its MapReduce processing roots and build tool such as Dremel (productized as Big Query), Percolator and Pregel.
Of course, no amount of work wholly outside the world of Hadoop is going to change the amount being done with Hadoop as the foundation of a BI and SQL revolution. A startup called Platfora is promising a product that turns Hadoop into an engine for an entirely new type of BI experience. Hadoop-distribution vendor MapR is driving an open source project called Apache Drill that replicates Google’s Dremel on top of Hadoop. Outside Hadoop-based startups, large companies such as EMC Greenplum, Microsoft, Teradata and others have at least decided to make Hadoop a first-class citizen in their big data work, even where they have other legacy products to push as well.
Daniel Abadi, Hadapt co-founder and chief scientist, and Yale professor, says that just because you might circumvent MapReduce to increase the speed of processing different types of jobs, “that doesn’t mean you have to circumvent Hadoop itself.” Hadapt, he added, actually emerged from a research project called HadoopDB that had been trying to make Hadoop more interactive since its inception in 2009. Three years later, it’s clear Abadi and company were onto something.
Feature image courtesy of Shutterstock user FWStudio.