Blog Post

If you thought the Hadoop war of words was over, think again

Different year, different CEOs, different arguments, same old story: Cloudera and Hortonworks do not see eye to eye about the best way to run a business based on Hadoop. Two years ago, Hadoop startups Cloudera and Hortonworks took great pleasure in publicly debating who was more Hadoop than who by counting Apache contributors, commits and bug fixes. Today, the debate is over business models — specifically, whether it’s better to be Red Hat or IBM.

This time around, Cloudera fired the opening shot by declaring in October that it doesn’t consider Hortonworks or fellow Hadoop vendor MapR to be its competitors anymore. Cloudera CEO Tom Reilly told me then that while those companies are focused on Hadoop as a technology and business focal point, Cloudera is now focused on becoming an “enterprise data hub” that offers a whole suite of data products a la IBM or, presumably, Oracle. The logic goes that although Hadoop is the foundation of that strategy, the real value comes from higher-level features and data-analysis products, many of which Cloudera is building itself and some of which are open source even if not via Apache.

On Tuesday, Hortonworks fired back via a blog post by VP of Corporate Strategy Shaun Connolly who wrote, essentially, that Cloudera’s (er, “one company’s”) model is the wrong one. At least right now. The Hortonworks team believes it’s poised to take the Hadoop crown and Cloudera is poised to fail because the market is still too young to buy the kind of package Cloudera is selling. Modeling itself after Red Hat, Hortonworks is content to keep making Apache Hadoop better, keep helping customers who want basic Hadoop as the product, and keep letting partners like Microsoft, SAP and Teradata add the bells and whistles and drive adoption by their customers.

It is, as Connolly’s chart mapping Red Hat revenue shows, a long-term strategy.

Source: Hortonworks
Source: Hortonworks

As tiresome as the back-and-forth can get, though, it is kind of fascinating that because Hadoop is really a market unto itself — one based on open source technology, at that — there’s relatively little debate about whose technology is better. They’re all technically distinct, but only one pure-play Hadoop company — MapR — has ever really sought to distinguish itself based on pure technological edge. For the most part, they’re really asking customers to choose one business model (or maybe one support contract) over another.

The truth is that it’s probably too soon to declare either company a winner or a loser; there’s good reason to believe they’ll both win. On one hand, Hortonworks’ strategy is nothing if not prudent, and its big-time partners (if they stick around) do represent a foot in the door to some major corporations that already rely on those partners’ software. On the other hand, Cloudera’s strategy is more audacious (it’s essentially battling IBM et al and Hortonworks et al, whether it wants to or not) and probably has more income upside if the company can execute it.

Anyone interested in hearing more about this should attend our Structure Data conference in March, where we’ll have executives from Cloudera and Hortonworks on stage answering questions about how they view the big data world. Otherwise, I guess everyone can just rest assured knowing that Hadoop probably really is the best thing to happen to data since sliced bread (or SQL?), so a step toward Hadoop is probably a step in the right direction. If it’s as revolutionary as everyone says it is, what’s a couple degrees more or less in either direction?

6 Responses to “If you thought the Hadoop war of words was over, think again”

  1. While it’s unfortunate the marketing sometimes seems to include smack talk within various vendors in the big data space, it’s a great success story for the Apache Software Foundation and our Apache Hadoop project and the many other Apache projects building Hadoop-related software.

    Part of the success of many Apache projects is precisely the ASF’s neutrality: by offering an independent home and governance model for our 140 Apache projects, it allows many companies which otherwise may be fierce competitors a place to collaborate and join together to provide some great software under the Apache brand that is available to all.

  2. George K. Mathew

    Hi Derrick,

    The Hadoop market continues to fracture is some expected ways. For example, we know customers who are now trying to deploy post-MapReduce engines like Impala (led by Cloudera) on a Hortonworks cluster. One related market force to not underestimate is the entrance of the heavyweights into fray: both the IBM & Intel distros come to mind.

    Anyway, 2014 will still be the Wild West in the Big Data/Analytics space. Alteryx, Cloudera, & Tableau covered 14 of our collective predictions for ’14 here:


  3. I think about 12 months ago, MapR really had something with their product — at that time, core Hadoop (esp HDFS) was *not* as performant, and there were several SPOFs. In the last 9-12 months though, those arguments slowly became no longer valid — core hadoop has caught up & bridged any gap that MapR once enjoyed.

    At this point, I think it’s coming down to “who has the biggest install base & best support” & in both cases, it’s really no contest — Cloudera stands alone.

  4. Amr Awadallah


    Another subtle difference in the Redhat model is fact that the RedHat “certified” binaries for RHEL can’t be deployed in production without a subscription license. So while the source code behind RHEL is open, the RHEL binary it self can’t be used without paying Redhat first. That is what allows them to make all the revenues they make, it is also what provides distinction for them from other vendors who try to resell derivates of RHEL compiled from scratch.

    That subtlety isn’t true for HortonWorks. I can deploy as much of their binaries on my servers without having to pay a penny. It has a very weak monetization path.

    — amr

  5. michaelhausenblas

    Thanks for this thoughtful post, Derrick. I suppose we’ll all only know in some 5 to 10 years time what exactly worked and maybe even understand why. With my data engineering hat on I simply have to agree with John (well, uhm, yes, he’s also my boss, but … :) that there are simply too few data points that we have, ATM, to predict anything. So, while I’m happy for Shaun and the HW folks that they have a nice story to tell and can even back it up with a chart (!) it’s as simple as this (for me): focus on the customer; deliver the best product/solution by owning the customer’s problem, work not only hard but smart and the rest will unfold … the one way or the other.


    FWIW, I wrote up my 2c worth of opinion around Hadoop business models here:

  6. Derrick Harris

    Someone sent me a blog post from MapR CEO John Schroeder (that I wish I had seen earlier) in which he adds fuel to the fire by comparing MapR to a software company like Splunk and poking holes in the argument that any company should try being “the Red Hat of” anything. Red Hat, he says, is the only company that has ever lasted and made serious money following that model; MapR embraces its role as the Hadoop vendor that doesn’t even have to try and play the open source angle.