Summary:

Companies who are concerned with privacy and bandwidth issues, but who want to take advantage of the processing power of Hadoop, are actively pursuing a “pump to Hadoop and pull from Hadoop structure,” according to Hortonworks’ Ari Zilka, speaking at Structure:Data on Thursday.

Ari Zilka of Hortonworks, James Makarian of Informatica, Mark Cusack at RainStor, Justin Borgman at Hadapt, and Jo Maitland of GigaOM at Structure:Data 2012
Ari Zilka of Hortonworks, James Makarian of Informatica, Mark Cusack at RainStor, Justin Borgman at Hadapt, and Jo Maitland of GigaOM at Structure:Data 2012

(c) 2012 Pinar Ozger. pinar@pinarozger.com

Companies who are concerned with privacy and bandwidth issues, but who want to take advantage of the processing power of Hadoop, are actively pursuing a “pump to Hadoop and pull from Hadoop structure,” according to Hortonworks Chief Product Officer, Ari Zilka, speaking on a Future of Hadoop panel at Structure:Data on Thursday.

James Markarian of Informatica addressed the issue that most Hadoop applications tend to be data-intensive instead of resource-intensive. “The challenge is that the big elephant doesn’t move through the little pipes all that well,” he said. When the processing is colocated in the cloud, it’s no problem. But most companies store their data behind firewalls.

In describing the “pump in, pull out” approach, Zilka added that “Hadoop is forcing the unlocking of data.” Financial companies can’t put all their data in a public cloud, but they can remove credit IDs, passwords, etc. and spill it out to the cloud to do a massive processing job, then pull it back in to remap the data to the personal identifiers.

Hadoop is currently deployed over thousands of commodity boxes, but as the architecture evolves and the size of the data sets increase, the system will have to move toward a monolithic stack. Justin Borgman of Hadapt pointed out that one of the challenges is that the appeal of Hadoop is that you can run it on commodity hardware.

But as Mark Cusack of RainStor said, “It doesn’t make environmental or economic sense to throw more boxes” at the problem. Markarian wondered aloud if there are really exabyte problems that will need to be solved, or if there’s a limit to the size of data that we’ll be working with. Several members of the panel discussed the need for better compression. Right now, compression is a one-size-fits-all solution, Cusack said, but there’s a need for a “much more targeted, tailored compression.” He added, “Compression is a key driver.”

Watch the livestream of Structure:Data here.

Update: This post has been updated to fix a typo.

Watch live streaming video from gigaombigdata at livestream.com

Related research

Subscriber Content

Subscriber content comes from Gigaom Research, bridging the gap between breaking news and long-tail research. Visit any of our reports to learn more and subscribe.

You're subscribed! If you like, you can update your settings

Comments have been disabled for this post