IBM Creates Big Data Frankenstein with Netezza-R Fusion

IBM (s ibm) and Revolution Analytics have brought together SQL queries and predictive analytics by integrating Revolution’s R Enterprise statistical analysis software with IBM’s Netezza TwinFin data warehouse appliance. It might sound like a just a run-of-the-mill technology partnership, but it’s part of a significant evolution in analytics strategies as big data becomes a big problem — and big opportunity — for organizations of all types. They’ll need to analyze the same data in a variety of ways for a variety of purposes, all with the goal of creating a holistic picture of their business, and, in the name of convenience, at least, it makes a lot of sense to store all that data in a single location.

The rationale for integrated data analysis environments is pretty simple. For starters, organizations storing and analyzing large amounts of data are already investing heavily in the resources to store and process that data and the personnel to analyze it, but storing all the data in a single location saves on both capital expenditures and space. Furthermore, integration likely will help create data analysts versed in both software products instead of having separate teams, each familiar with their own products, running on their own, probably siloed, set of resources. Siloed resources can lead to missed opportunities because of physically disparate data and personnel, as well as potential performance bottlenecks and lag times if users are trying to transfer data from one environment to another.

In the case of R and Netezza, Netezza provides the traditional relational data warehouse analysis and query capabilities (as well as the storage and computing gear), while R provides advanced statistical analysis tools, including those for tasks such as predictive analytics. Revolution’s R Enterprise is a commercial version of the standard open-source R software, which is widely used by private companies and research institutions alike. As I mentioned in a post last week, ParAccel is actually building its own Hadoop connector to let customers migrate, store and analyze unstructured Hadoop data within their ParAccel analytic databases. Database industry analyst Curt Monash noted that this type of integrated data environment was also probably a driving force behind Teradata’s (s tdc) decision to buy Aster Data Systems. In these types of integrations, assuming the presence of high-performance appliances or servers, R or Hadoop users might actually experience better performance than they would have on clusters of commodity servers.

In other instances, Cloudera has partnered with a variety of data warehouse (including Netezza), database and BI vendors to integrate those environments with customers’ Hadoop environments for processing and analyzing both structured and unstructured data using the right tools for the right job. The bottleneck problem might still arise in these situations as data moves between Hadoop clusters and databases, but a faster internal network could help alleviate that problem, and data moving from the Hadoop cluster post-processing will be far smaller in volume than the raw data from which those results were derived.

It seems only logical that we’ll see more and more technology integrations of this ilk — with some variance in the details, of course — as big data analytics becomes an even bigger issue and even better understood among end-user organizations. It’s not just about storing lots of data, but also about getting the best insights from it and doing so efficiently, and having large silos for each type of data and each group of business stakeholders doesn’t really doesn’t really advance either goal.

To learn more about R Enterprise, check out our recent video interview with Revolution Analytics Chairman and CEO Norman Nie, who also is partially responsible for the creation of SPSS, the predictive analytics software that IBM bought for more than a billion dollars in 2009. You can also attend our Structure Big Data conference next week, which will feature speakers from Revolution, Netezza, IBM, Aster Data, Cloudera many other analytics thought leaders.