Much has been written about the “big data” phenomenon — the petabytes of machine data from computers, sensors and other equipment; social networking data; scientific data — is a rich but unwieldy trove that is available for the taking. The big data problem is that the sheer amount and diversity of this data outmatches the abilities of traditional relational databases like Oracle, SQL Server and DB2 to handle effectively. With the Hadoop distributed data file system and MapReduce processing power, that data can be aggregated. The next step is finding tools to analyze it further.
It’s that analytics problem that has Andy Palmer excited. Palmer is a serial database entrepreneur who co-founded Vertica Systems (now part of Hewlett-Packard) and VoltDB and was a founding board member of Bluefin Lab, CloudSwitch (now part of Verizon), and Recorded Future.
“The real purpose of big data is to enable big analytics. The most compelling companies out there, I think, are those that attack that problem,” Palmer told me this week.”I really do believe that big data is, in and of itself, a tool. The real story is more about big analytics. Once you aggregate the data you then have to ask really hard questions.”
The surging interest in data analytics and visualization tools supports his take. Splunk last month filed for its IPO, and Tableau is well on its way. Another analytics player, QlikView went public last summer, and its stock has doubled since launch, as Derrick Harris reported in GigaOM. All of these companies aim to help users make sense of all that data.
Palmer, who often works with database pioneer Michael Stonebraker, shares Stonebraker’s view that the sheer variety of data formats and the types of operations to be performed on them call for a variety of specialized databases.
There is a real need for database technology that can handle multi-dimensional data arrays — data sets that often come out of astronomy and other scientific research, Palmer said. “When you represent data in traditional relational databases, you can compromise the inherent nature of the data. And if you integrate a lot of data together, ultimately that data looks like a large array. Representing an array in a traditional database is really an unnatural act,” he said.
He is backing yet another Stonebraker company, Paradigm4, that is attacking that problem. In the past, the big database powers were able to shoehorn new types of workloads into their relational model. For example, a decade or so ago, there was a raft of small, innovative object database companies — Object Design, Ontos and others — that built their businesses on the premise that relational databases could not handle objects which did not fit well into the rows-and-columns world of relational databases. Over time, however, the big data base players pushed and shoved at least some object capabilities into their databases, and those smaller companies disappeared.
Palmer and others in the big data world said this won’t happen again — that big data cannot be co-opted the same way — it would be way too expensive and resource intensive for traditional databases to try to churn through all this stuff. That’s why Oracle et al. are coming out with specialized big data products.
And when it comes to big data, the data itself will be meaningless unless the right analytic tools are available to sift through it and there are people who know what questions to ask. Big data, and the big analytics used to make sense of it, will be hot topics at GigaOM’s Structure: Data conference next month in New York City.