5 Comments

Summary:

As organizations strive to analyze more data than ever and to do it faster than ever, the results they’re getting might actually be worse than those in the pre-big-data and real-time world — at least temporarily.

Knome, Metamarkets, ITA Software, OmniTI, Karmasphere at Structure Big Data 2011

Knome, Metamarkets, ITA Software, OmniTI, Karmasphere at Structure Big Data 2011As organizations strive to analyze more data than ever and to do it faster than ever, the results they’re getting might actually be worse than those in the pre-big-data and real-time world — at least temporarily. During a panel discussion of “master data wranglers,” a major topic of conversation was the trade-off between analyzing lots of data fast and taking adequate time to drive real, meaningful results.

According to Abe Taha of Karmasphere, this problem exists in large part because the cost to store and analyze data has become so much cheaper, meaning organizations can actually take advantage of what they’re producing. He points to Hadoop, especially, as having democratized the capability of doing big data. Or, as Metamarkets Co-Founder Michael Driscoll puts it, organizations are suffering from the “attack of the exponentials,” which means that storage and bandwidth have gotten exponentially cheaper, making it feasible to tackle all this new analysis. Ideally, he said, the benefits of analyzing it outweigh the cost of analyzing it. With all that data now on hand and analyzable, said Glenn McDonald of ITA Software, “the expectatuion is that you’ll use it.”

But herein lies the problem, says Theo Schlossnagle of OmniTI. We’re listening to more things, he explained, but we’re not listening any smarter. In fact, he thinks the signal-to-noise noise-to-signal ratio is very high, often resulting in worse decisions. However, he added, this might just be a matter of growing pains as organizations learn how to do big data optimally. Until that point, he said, the question is whether timeliness outweighs correctness.

Driscoll calls this situation “analysis paralysis,” citing the example of the CIA suffering through a decade of weakened analytics efforts before finally figuring it out. Very likely, he said, it could get worse before it gets better.

McDonald sees a value in the push to real-time analysis, though, even immediately: it’s much easier to figure out the right questions to ask. If it takes 14 hours to rerun an analysis because some factor was weighted incorrectly or something else went wrong, it’s very difficult to learn from your mistakes. If you can “get in” the data, explore and figure out what’s going on, it’s a lot easier to refine your algorithms, he said. Users must be able to interact with the data.

One thing seemingly everyone agreed on, however, is that we will figure it out, thanks in large part to the same trend that enabled big data: cheap infrastructure. Through options like cloud computing (Driscoll’s company stores about half a petabyte in S3), organizations can afford to take chances they might not otherwise take if it meant spending large amounts on server and storage infrastructure.

Watch live streaming video from gigaombigdata at livestream.com
  1. Let me be the first to ask…

    Shouldn’t that be “Are big data making us dumber?”

    Share
    1. Derrick Harris Friday, March 25, 2011

      I’ll punt on the stylistic question of technical accuracy vs. common usage, but “big data” is a singular phenomenon.

      Share
  2. This was one of the best sessions of the day. One edit here.

    “In fact, he thinks the signal-to-noise ratio is very high, often resulting in worse decisions”

    What I think you mean is the SNR is getting lower. More SNR is a good thing.

    Share
    1. Derrick Harris Friday, March 25, 2011

      Good catch. That’s definitely true.

      Share
  3. David Colbourn Monday, March 28, 2011

    I like “attack of the exponentials,” it is disk price driven and telling. I am concerned about the reuse of “analysis paralysis,” which use to relate to as-is logical and physical modeling delays and not the data timelyness issue of extended run times. We need a new quote for that.
    The signal to noise ratio refrence is deep! It is a data quality refrence but the illusion is to another type of computing like the human brain does. AI processing with low signal strength and high noise may be the future but it doesn’t have to make us dumber (see Watson.)

    Share

Comments have been disabled for this post