3 Comments

Summary:

Intel’s getting into the open source software business with it’s own version of Hadoop. It joins a host of startups as well as EMC Greenplum in building a distribution for big data.

hadoop1-210x140

Intel on Tuesday said it was getting into the software business with its own Hadoop distribution. The move is a potential blow for startups such as Cloudera, Hortonworks and MapR that are offering their own distributions of Hadoop, but it’s also an admission by the chip vendor that the opportunity in big data isn’t only to be found in selling hardware.

In a conference held in San Francisco, VP and General Manager of Intel’s Datacenter Software Division Boyd Davis explained Intel’s history in Hadoop that stretches back to 2009 and stressed that Intel is going to share some aspects of its Hadoop distribution, but not all. Intel has a distribution of Hadoop it has released in China, but today it’s bringing it to the United States Intel’s version of the Hadoop distribution uses Hadoop 2.0 and YARN, which is a cutting-edge version of  platform compared with what most Hadoop users have deployed thus far.

Why Intel wants to push its own version of Hadoop

intelhadoophistory

Boyd introduced partners such as and Cisco, which has tuned the Intel Hadoop distribution for its own servers. Intel also hosted a panel that included executives from SAP, Red Hat and Savvis to discuss the challenges of big data and the promise of Hadoop and big data.

Davis was up front about Intel’s rationale for releasing its own distribution, namely that it was worried about the fragmentation and possible uncertainty associated with current Hadoop distributions. That could be read as a dig against the many startups already offering Hadoop distributions, all of which are slightly different (of course, Intel’s will be slightly different, too). Like all of the existing players such as Cloudera and MapR, Intel will open source certain aspects of its distribution, but will also keep software to itself.

Inside the data center, it’s no longer just web servers that matter

For example, Davis stressed that Intel will not share its management and monitoring software, which could be highly valuable for enterprise customers. The Intel software could coordinate with Intel’s data center management software and make managing a variety of workloads easier. And hidden in that coordination might be one Intel’s aims in pushing its own version of Hadoop — the threat of ARM chips used in Hadoop clusters.

Dell, Calxeda and others are evaluating the use of lower-performance, lower-power chips in Hadoop clusters, a market Intel would hate to cede in the data center as data grows and analytics becomes more important. To that end, Intel has also optimized its Hadoop distribution for solid-state drives, something that other Hadoop companies haven’t done so far.

When asked about Atom and the use of lower-performance processors for Hadoop, Davis noted that while people are using lower-end processors for Hadoop , but that those uses tend to have slower networking. Davis says that when you combine high-end processors with 10 gigabit Ethernet and Hadoop, customers get the performance that they want.

intelhadoop

So while Intel may tout stability and consistency as the reason for it’s decision to become a major player in the software market for big data, it’s also driven by the changes in the data center that threaten the grip Intel has on the hardware inside the data center. The cloud and big data has changed the workloads and hardware requirements for the data center and Intel is playing the long game in trying to release software that can be tuned to its chips.

The Hadoop drama isn’t over yet

Intel isn’t the only big vendor touting its own homegrown version of Hadoop. On Monday, EMC’s Greenplum division announced an entirely revamped version of its Hadoop distribution that’s merged with it’s flagship analytic SQL database. These big companies have big existing businesses to protect and lots of resources to put into doing it. As my colleague Derrick Harris wrote on the EMC news:

Looking past his competitive boasting, though, it’s easy to see [Greenplum's Scott] Yara’s greater point when you ask him what all this Hadoop talks means for the data warehouse business on which Greenplum was built. He points to the mainframe business that fell from its high perch decades ago but still drives billions a year in revenue. A single MPP database system is still faster on certain workloads than SQL on Hadoop, but that gap will close over time and “I do think the center of gravity will move toward HDFS,” he said.

Hadoop is a juggernaut when it comes to big data. Intel is a juggernaut when it comes to data center infrastructure. Its decision to enter into the open source software market is a big one for the chip company, for the Hadoop ecosystem and for the myriad startups playing in this space. It’s a topic we’ll explore more during our Structure Data conference in New York on March 20 and 21.

  1. Wow game on and good thing Cloudera raised all that VC money ;)

  2. A simple question : “Why is Intel a juggernaut when it comes to enterprise infrastructure”? What exactly are they selling to enterprises ? With the same logic can we also argue that Power supply vendors are Enterprise infrastructure Juggernauts and expect a Hadoop distribution from Tripp Lite ;o) ?

  3. Anatoli Fomenko Wednesday, March 6, 2013

    Stacey,

    Thank you for your insightful article. I would like to add, perhaps, a relevant technical detail that might help to shed more light on lately so popular Hadoop distribution topic. First of all, Hadoop Ecosystem is a set of software products created by Apache Software Foundation. Various companies that have a vested interested in Hadoop success, such as Intel and Greenplum, contribute to the Hadoop development. When it comes to building the Hadoop Ecosystem (e.i., creating a distribution) that includes a deliberately chosen subset of Hadoop Ecosystem modules, the most frequently used by all distribution companies tool is Apache Bigtop project [Link: http://bigtop.apache.org%5D. Here is a list of some projects powered by Bigtop [Link: https://cwiki.apache.org/BIGTOP/powered-by-bigtop.html%5D. While every Hadoop distribution company should pursue own goals to facilitate its business interests, Hadoop Ecosystem stabilization and improvement de facto happens in Apache Bigtop. All the vendors might benefit more from collaboration around Apache Bigtop, and then looking into details of their own distributions.

    Anatoli Fomenko
    Apache Bigtop Committer
    Software Engineer at Karmasphere

Comments have been disabled for this post