How EMC Is Embracing the New Big Data Bundle

EMC signed a partnership deal with analytics pioneer SAS (s sas) Tuesday that shows how far the storage company is willing to go in order to cash in on the big data wave. Despite an industry-wide push for better and more-complete big data strategies, it’s beginning to look like EMC (s emc) and IBM (s ibm) will be the two technology vendors earning the most data-related dollars once the dust settles. To be fair to IBM, it’s still quite a distance ahead of EMC in terms of its big data breadth, but news like today’s partnership show that EMC understands what it needs to do to compete and that it’s willing to do it. There definitely are other vendors with the means to compete, but until they embrace the future of analytics like IBM and EMC have done, they simply won’t have the tools to deliver what customers will demand.

Companies of all types are catching on to what bleeding-edge companies like Google (s goog) and Yahoo (s yhoo) have known for years, which is that the more information they can collect and analyze, the better they can serve their customers and optimize their own operations. The data comes from everywhere — server log files, surveys, social media, forms, transactions, site traffic, you name it — and the use cases range from building recommendation engines to more-targeted marketing to figuring out how to optimize IT infrastructures for maximum efficiency. At this point, storing all that data has become the easy part — the hard part for companies is figuring out how to use that data to their advantage. This is something IBM has realized for a few years now, and it’s why EMC is doing everything in its power to evolve from being a storage-only vendor to a storage and analytics vendor. Like storage, though, analytics is not a single-prong market for vendors that want to dominate it.

At the least, I think, the new core bundle of big data will requirements will be:

  1. storage
  2. a MapReduce framework for processing unstructured data
  3. an analytic database
  4. predictive analytics
  5. business intelligence (BI)

As the result of billions in acquisitions over the past several years — including SPSS, Netezza and Cognos — as well as plenty of internal innovation, IBM has these areas covered, and more. IBM had storage and added an analytic database with Netezza, BI with Cognos, and predictive analytics with SPSS and an integration with Revolution Analytics. It also has a very busy Hadoop division working on an enterprise-grade Hadoop distribution that runs atop IBM’s GPFS storage product, as well as Hadoop-based applications such as InfoSphere BigInsights. Jeff Jonas, who keynoted at our recent Structure: Big Data event, appears to have his very own Entity Analytics division dedicated to software that helps users determine who’s who (sub req’d) in real time. IBM also spent years creating its Watson system that made headlines earlier by defeating two former Jeopardy! champions, but the technology headline was the incredible advances in machine-learning and natural-language processing, arguably the two next frontiers of big data. When it all comes together, organizations are able to analyze and visualize their data from the moment it hits their networks, for any number of purposes.

EMC’s first move was buying analytic database vendor Greenplum in July, an acquisition it quickly productized with a souped-up appliance combining high-end storage and computing gear with Greenplum’s analytic software (it’s called the EMC Greenplum Data Computing Appliance). But now, appliances like this (and the databases that power them) are becoming a dime a dozen, as Oracle, IBM, Teradata and (soon) HP all do the same thing. This is why, today, EMC topped off a trio new appliance offerings with a partnership with predictive analytics pioneer SAS through which SAS will offer its new High-Performance Analytics software running on the EMC Greenplum Data Computing Appliance. It’s why EMC next month will announce its first product to incorporate the power of Hadoop to process unstructured data. It doesn’t have a big data portfolio that can compete with IBM just yet, but EMC’s relentless pace since buying Greenplum shows that it gets where it needs to go.

The watch is on now to see if anyone else will step up by buying into big data at the required level, or if vendors such as Oracle (s orcl), HP (s hpq), Teradata (s tdc) and even SAP (s sap) will be content being component-level players in best-of-breed environments. They all have some parts in place and HP and Teradata have been active in the M&A arena lately, but I just haven’t seen the progressive vision that IBM and EMC are demonstrating. The good news for everyone, though, is that there is no shortage of Hadoop startups to snatch up and fill that void, and there even are a few good database vendors around, including ParAccel, Objectivity and InfoBright. The same goes for BI, where companies like Pentaho and Jaspersoft are touting their big data relevance to anyone who’ll listen. Just one strategic move by a vendor on the cusp, and the picture changes significantly.

Image courtesy of Flickr user alancleaver_2000.