IBM has tapped Cloudera as the primary commercial Hadoop distribution of choice for its big data platform.
This represents something of an about-face by both companies. IBM executives used to complain (privately) that Cloudera was not ready for enterprise use. And their counterparts at Cloudera used to beef that IBM did not contribute to the Apache Hadoop project that underlies all the Hadoop distributions.
To date, IBM had blessed the open-source Apache Hadoop for its evolving big data effort. On Wednesday it said it was expanding that platform to encompass other Hadoop distributions “starting with Cloudera.” The news comes as IBM announced its acquisition of Vivisimo, a big data analytics toolset that expands search and analysis beyond Hadoop to traditional legacy applications and their data repositories.
IBM database rival Oracle already tapped Cloudera as the basis of its Big Data Appliance, a move that led many to speculate that acquisition-hungry Oracle would buy privately held Cloudera. That speculation, by the way, is ongoing.
According to IBM’s statement:
Cloudera is a top contributor to the Hadoop development community, and an early provider of Hadoop-based systems to clients across a broad range of industries including financial services, government, telecommunications, media, retail, energy and healthcare. As a result, Cloudera Hadoop clients can now take advantage of IBM’s big data platform to perform complex analytics and build a new generation of software applications.
Cloudera is one of the primary and first commercial distributions of Hadoop but there are others available from Hortonworks, MapR, and EMC Greenplum. The fact that Cloudera got the nod from two industry giants is noteworthy given the number of choices out there and may in fact make Cloudera first among equals in the commercial Hadoop field.