Mark October 2012 on your big data calendar, because this might be the month we redefine what Hadoop is. Is it a MapReduce framework for heavy-duty batch processing? Yes. But can it also be the engine of high-speed, interactive analytics products that look to do for unstructured data what massively parallel analytic databases do for structured data? As it turns out, the answer might be “yes” again.
There certainly are a number of companies trying to prove this contention, this week alone. On Tuesday, it was Hadapt improving on its native Hadoop-plus-SQL architecture and adding advanced analytic functions and tight Tableau integration. On Wednesday, it’s Teradata, Birst and startup Splice Machine getting into the act.
Birst: Birst has been making a concerted effort to establish itself as a legitimate BI company by building a suite of on-premise offerings, but it’s moving back to the cloud with a new big data service built atop Hadoop. Essentially, Birst Big Data Services lets companies store unstructured and semi-structured data and then analyze it on the fly using packaged functions that don’t require knowledge of MapReduce or other complex methods. Because it’s Birst, the new service connects to structured relational data stored within Birst’s flagship service and brings visualization tools to new data types.
Splice Machine: Splice Machine, a San Francisco startup that built a SQL database atop the Hadoop Distributed File System on Wednesday announced $4 million in first-round funding from Mohr Davidow Ventures. Like fellow startup Drawn to Scale, Splice Machine promises the best of both worlds — SQL functions and transactions on top of a distributed foundation of HDFS and HBase. That’s a lovely story if it works, as companies can expect flexible schema for unstructured data, massive scalability, as well as a continued bond with their favorite SQL BI products.
Teradata: Teradata has finally done something people have expected it to do for a long time by building an appliance — the aptly named Big Analytics Appliance — that packages Hadoop with the company’s Aster Data database. Teradata bought Aster Data a couple years ago to capitalize on the unstructured data that’s at the core of the big data movement but that doesn’t comport with Teradata’s core data warehouse and analytics business. Aster Data’s claim to fame is its SQL-MapReduce software, which lets users run MapReduce jobs using standard SQL.
The whole Aster-Hadoop shebang ties into data stored in Teradata’s flagship database via a connector called SQL-H. The secret sauce is SQL-H, which lets users access Hadoop data, join it with data in Aster and then analyze it.
As impressive as these new offerings sound, we probably haven’t seen anything yet. The rest of the Hadoop ecosystem isn’t blind to what’s happening. At the O’Reilly Strata conference and Hadoop World next week, we should expect to see how some of the bigger players are thinking about answering the very important question of how closely we can align Hadoop with business users’ skills.