Server makers of all stripes have been trying to capitalize on the big data buzz by pushing Hadoop-optimized systems, and now we can count Cisco among them thanks to its latest partnership with EMC. The companies have come up with a reference architecture featuring Cisco UCS server gear that’s designed to run the EMC Greenplum MR software, the company’s “enterprise-class” Hadoop distribution that features technology it OEMs from Hadoop startup MapR.
The reason behind this arrangement — as well as behind similar deals EMC competitor Cloudera has with Dell and SGI, and Oracle’s aptly named Big Data Appliance — is to try to turn a profit on software that was designed to run on commodity hardware that can scale out cheaply and be replaced easily when it dies. However, server vendors are wisely banking on a few key considerations, mostly stemming from the fact that large mainstream enterprises don’t have the systems engineering resources of early Hadoop adopters such as Google, Yahoo and Facebook.
As I explained recently in GigaOM Pro, big data appliances and reference architectures could be a lucrative business (sub req’d) because many companies don’t want to go through the hassle of learning the ins and outs of Hadoop cluster management. It can be pretty tricky to design a system optimized for Hadoop, where users aren’t sacrificing performance at some level or wasting energy running more servers than they actually need just to accommodate growing storage volumes. Having someone deliver a system including all the right gear and management software, and then actually deploy it and provide professional services can be pretty appealing.
In the case of the Cisco-EMC architecture, the we’re talking about single-rack or multi-rack fabrics packed Intel processors, memory and storage, with the units connected via 10 GbE. Cisco has been able to move a relatively high number of its UCS servers thus far without the Hadoop connection, so it has to be confident that enterprise interest in pinning their big data strategies on Hadoop will only help Cisco sell more, especially as the company tries to rebound after a recent restructuring.
Actually, the Cisco partnership wasn’t EMC’s only nod to Hadoop’s need for high performance on Thursday. The company also released version 4.2 of the Greenplum Database, which includes a new feature called gNet. According to the press release announcing the updated software, it “enables high-performance parallel import and export of all data (compressed and uncompressed) from Hadoop using gNet for Hadoop, a parallel communications transport. This achievement represents the industry’s first direct query interoperability between Greenplum Database and Hadoop.”
Not that it hasn’t been driving the bus of big data hype for a couple years, but Hadoop has been particularly hot lately as more large vendors pin their big data hopes on the open source platform. This week alone, Microsoft and VMware enhanced their Hadoop efforts with new products aimed at making it more accessible by mainstream developers and business users. We’ll be talking a lot more about Hadoop and how it will evolve at our Structure: Data conference March 21-22 in New York.