It’s not the big data, it’s the right data

17 Comments

Big data’s fine; the right data, however, is a game changer.

Much has been written about the “big data” phenomenon — the petabytes of machine data from computers, sensors and other equipment; social networking data; scientific data — is a rich but unwieldy trove that is available for the taking. The big data problem is that the sheer amount and diversity of this data outmatches the abilities of traditional relational databases like Oracle(s orcl), SQL Server(s msft) and DB2(s ibm) to handle effectively. With the Hadoop distributed data file system and MapReduce processing power, that data can be aggregated. The next step is finding tools to analyze it further.

It’s that analytics problem that has Andy Palmer excited. Palmer is a serial database entrepreneur who co-founded Vertica Systems (now part of Hewlett-Packard(s hpq)) and VoltDB and was a founding board member of Bluefin Lab, CloudSwitch (now part of Verizon(s vz)), and Recorded Future.

“The real purpose of big data is to enable big analytics. The most compelling companies out there, I think, are those that attack that problem,” Palmer told me this week.”I really do believe that big data is, in and of itself, a tool. The real story is more about big analytics. Once you aggregate the data you then have to ask really hard questions.”

Andy Palmer

The surging interest in data analytics and visualization tools supports his take. Splunk last month filed for its IPO, and Tableau is well on its way.  Another analytics player, QlikView went public last summer, and its stock has doubled since launch, as Derrick Harris reported in GigaOM. All of these companies aim to help users make sense of all that data.

Palmer, who often works with database pioneer Michael Stonebraker, shares Stonebraker’s view that the sheer variety of data formats and the types of operations to be performed on them call for a variety of specialized databases.

There is a real need for database technology that can handle multi-dimensional data arrays — data sets that often come out of astronomy and other scientific research, Palmer said.  “When you represent data in traditional relational databases, you can compromise the inherent nature of the data. And if you integrate a lot of data together, ultimately that data looks like a large array. Representing an array in a traditional database is really an unnatural act,” he said.

He is backing yet another Stonebraker company, Paradigm4, that is attacking that problem. In the past, the big database powers were able to shoehorn new types of workloads into their relational model. For example,  a decade or so ago, there was a raft of small, innovative object database companies — Object Design, Ontos and others — that built their businesses on the premise that relational databases could not handle objects which did not fit well into the rows-and-columns world of relational databases. Over time, however, the big data base players pushed and shoved at least some object capabilities into their databases, and those smaller companies disappeared.

Palmer and others in the big data world said this won’t happen again — that big data cannot be co-opted the same way — it would be way too expensive and resource intensive for traditional databases to try to churn through all this stuff. That’s why Oracle et al. are coming out with specialized big data products.

And when it comes to big data, the data itself will be meaningless unless the right analytic tools are available to sift through it and there are people who know what questions to ask. Big data, and the big analytics used to make sense of it, will be hot topics at GigaOM’s Structure: Data conference next month in New York City.

Photo courtesy of  Flickr user Il conte di Luna.

17 Comments

Jim Rembach

I completely agree that the first step to capitalize on big data is in figuring out which data points are the most relevant. But the issue goes well beyond finding the right analytics tools. Even the best tools aren’t the answer, without skilled analysts who can break down reports and dashboards, and provide actionable business advice and direction that the c-suite can understand.

Eli Israel

The bigger the data, the more it draws a contrast with our limited human brains, which are sadly no bigger than they ever were.

The frontier isn’t (just) big data, it’s relevance and meaning.

Without a focus on results, big data is just piling up a bigger haystack on the theory that it somehow multiplies the chance of finding a needle.

Chris Taylor

So true. It is the right data, in the right hands, with the right context, at the right time. Any one of those pieces missing and Big Data is Bloated Data rather than useful.

Eric Fairfield

I have been analyzing what is now called ‘big data’ for more than 20 years, mostly in the Human Genome Project, microarrays, and now neuroscience. My first screen in the pile of data is for data quality. Not just general quality but quality in answering the problem that is being asked. Routinely, 90% of the ‘data’ has to be discarded at this stage because it is numbers but not really data. The biggest reason to discard ‘data’ is that the data is not known to be reproducible or the conditions under which the data were collected are unknown or can’t be reconciled with other data.

Leon Guzenda

The author missed two of the object databases companies – Versant and Objectivity. The latter’s Objectivity/DB was the first database to store (not just index) over a petabyte of data, back in 2000. They are active in the Big Data community and have a new graph database product called InfiniteGraph for supporting relationship analytics.

Barb Darrow

@leon you’re right… i left a few out. Not intentional…it’s interesting that they are still with us (as opposed to others i mentioned)

Jouko Ahvenainen

Some good points here, but I’m even more skeptical with ‘big data’ or ‘big analytics’, maybe I have seen too much that biz (founded a couple of data analytics companies). Many people who talk about the opportunity of data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how. Amazon is a success story, but it is based on their own data that is one format, target is up-sell and sales is done with their own services and tools. It is very different from a generic datacase or analytics tool.

steve

I contend that most these companies claiming to do “analytics” on Big Data are just doing Excel on steroids with drill-downs and visualizations. Show me the company doing econometric modeling, forecasting, and prediction using state-of-the-art modern statistical tools on Big Data, and you will see the Big Data Analytics company.

Alex James

Steve, check out MarketShare (www.marketshare.com) – that is precisely what they have built…

Alex James

Steve, my company is using MarketShare (www.marketshare.com) – working with big companies and doing exactly what you are saying…

steve

I contend that most these companies claiming to do “analytics” on Big Data are just doing Excel on steroids with drill-downs and visualizations.

Show me the company doing econometric modeling, forecasting, and prediction using state-of-the-art modern statistical tools on Big Data, and you will see the Big Data Analytics company.

Steve Ardire

> It’s not the big data, it’s the right data

Agree and right data includes Linked Data as a Service to enable automation of discovery, composition, use of relevant data sources to generate most resultant value.

Comments are closed.