Blog Post

So much Hadoop in so many places

The big Hadoop news today is Hortonworks’ entre into the product space with a new distribution, but it’s just one company trying to sell big-data-hungry businesses on its Hadoop prowess with new products. Individually, none of these announcements are particularly earth-shaking, but they’re very meaningful when taken as a whole. They’re part of a larger trend in which everyone with a data-driven business — Informatica(s infa), Microstrategy(s mstr), HP(s hpq), EMC(s emc), Oracle(s orcl), ParAccel, IBM(s ibm), Dell(s dell), Pentaho, Jaspersoft, you name it — has a Hadoop story to tell customers.

Karmasphere. Karmasphere and Amazon Web Services (s amzn) have teamed to make Karmasphere’s Analyst product available in a pay-as-you-go pricing model. This means AWS users can create Hadoop workflows using Karmasphere’s graphical interface and run them on Elastic MapReduce without having to purchase Karmasphere licenses through the traditional sales model. Of course, because the jobs run in Amazon’s cloud, users don’t have to purchase hardware either.

MarkLogic. Unstructured database provider MarkLogic is souping up version 5.0 of its product with a Hadoop connector that lets users run MapReduce jobs on MarkLogic data without it having to leave the database. That’s potentially a powerful feature because it speeds the MapReduce job by saving transmission across the network and by taking advantage of the database’s native performance features.

Sybase. Sybase, the analytic database from German software giant SAP, (s sap) released version 15.4 of its IQ product, which includes a native MapReduce API within the database as well integration with Hadoop environments. The former capability is designed for structured data stored within Sybase IQ, but the Hadoop integration will, according to the announcement, allow for “different techniques to integrate Hadoop data and analysis with Sybase IQ.”

Syncsort. Data integration specialist Syncsort released DMExpress 7.0, which includes enhanced Hadoop integration to make it easier and faster to extract data from all data environments and load it into Hadoop.

Alpine Data Labs. Alpine Data, a predictive analytics startup with close ties to EMC’s Greenplum division, said it will integrate its product with Hadoop in 2012. Alpine Data’s Miner product actually runs predictive analytic algorithms within the database itself, saving customers from having to employ a separate system for that job.

You’ll hear various estimates about how much data will be stored in Hadoop in the years to come — somewhere between half and all of the world’s data — but it’s a lot any way you slice it. Big data is driving everything now because there’s so much to learn and so many business opportunities for companies that truly understand what data says about consumers, systems, climate change or anything else they want to know. Hadoop is driving the big data ship because such much of that data, and more every day, is unstructured and not suitable for traditional relational database environments.

Thus all the chest-beating about Hadoop integrations, connectors and new products: Data-focused vendors without a Hadoop story don’t have much of a story at all.

2 Responses to “So much Hadoop in so many places”

  1. “You’ll hear various estimates about how much data will be stored in Hadoop in the years to come — somewhere between half and all of the world’s data…”

    I’ve heard this claim several times in the past, but haven’t actually found anything solid to back it up. (Albeit, I haven’t looked to hard) Can you further this point a little more??