Blog Post

What it means if Yahoo Hadoop spinoff doesn’t do distribution

It looks like all the speculation about how Yahoo’s Hadoop spinoff company, Hortonworks, will affect Cloudera and other companies providing Hadoop-based products might have been overblown. During a phone call earlier this week, Hortonworks CEO Eric Baldeschwieler told me the company is still figuring out its strategy around offering a Hadoop distribution, which could be good news for presumed competitors such as Cloudera.

The ambivalence appears tied to the company’s narrow focus on improving Apache Hadoop and making it the go-to distribution. Baldeschwieler said that Hortonworks’ core business model will be around offering support and services, as well as helping drive Apache to “bridge the gap between what [Hadoop] is and what it can be.” The latter goal, of course, means working hard to improve the core Apache Hadoop distribution to make it more scalable, reliable and generally flexible.

If Hortonworks doesn’t offer a distribution, it might be because it doesn’t want to waste resources. It would have to build its own distribution and then work within Apache to get any improvements built into that code, resulting in a doubling up of effort and a somewhat unnatural split of allegiances given Hortonworks’ professed support for Apache Hadoop. This is the same issue Yahoo was trying to avoid earlier this year when it discontinued its own distribution and recommited all its efforts into Apache Hadoop. It looks now like that move was just setting the stage for the Hortonworks launch.

Already, Baldeschwieler said, a number of key features from Yahoo are slated to be included in upcoming Apache Hadoop releases. These features include a new MapReduce engine, federated storage for HDFS and a major improvement for how HBase interacts with HDFS. What all the work means, he explained, is that Apache Hadoop will be more stable, more scalable and more dynamic. In fact, he said, with the next scheduled release, developers will be able to use alternative processing frameworks beside Hadoop MapReduce.

Good news for some, bad for others

A Hortonworks focused entirely on Apache could be good news for Cloudera. In that case, it’s still very much in its current position of integrating and hardening the suite of Apache Hadoop products into its own open source distribution, then selling services and management software on top of it. The big difference will be that Apache Hadoop will look a lot more appealing because it will have Hortonworks providing expert service. But Cloudera doesn’t really have to change its story.

A service-focused Hortonworks might not be so good for companies such as MapR, which are pushing proprietary or semi-proprietary Hadoop distributions. The fewer distributions and the more focused they are around Apache Hadoop, the less appealing outliers might look to users concerned about being locked into their vendor. Baldeschwieler says he thinks the market will be big enough for value-added distributions like what MapR offers, but noted that Apache Hadoop has already proven itself within large enterprise and will continue to get better.

For example, he explained, Apache has been working hard to integrate some of the code that Facebook has introduced from its Hadoop deployment. At the time it announced its Hadoop distributions in May, EMC said its Community edition is based on Facebook’s code, but now Baldeschwieler has heard EMC is reconsidering that decision and might support the core Apache code instead. That hardly constitutes hard evidence, but it’s noteworthy because EMC is integrating MapR’s proprietary storage technology in its Enterprise edition release.

“What we don’t want to see happen,” Baldeschwieler said, “is the Hadoop market start to look like the Unix market in the ‘80s.” The more support there is around Apache Hadoop, he explained, the less chance there is for a Unix-like lost decade of competing distributions before Linux came around in the ’90s and became the center of the non-Windows universe. He thinks Apache Hadoop is and should be the Linux of big data.

Whatever path Hortonworks takes, though, Baldeschwieler thinks all the action around Hadoop will make it very difficult for alternative technologies, such as Microsoft Dryad and LexisNexis’s HPCC Systems to catch. “I think they’ve got their work cut out for them if they want to compete with the Hadoop community,” he said. Because even if the companies involved are at odds, they’re still a very big community.

Feature image courtesy of Flickr user miheco.

5 Responses to “What it means if Yahoo Hadoop spinoff doesn’t do distribution”

  1. Waqas Khan

    Hi Andy — Thank you for your comment. The principle remains the same for organizations of any size. My suggestion would be to start by listing your existing channels of communication with your audience. Then consider which would be the most practical to test the addition of an opt-in request.

    Ind Tv Forum

  2. Anand Babu Periasamy

    Hartonworks is certainly a bad news for MapR and Cloudera. Others have to substantially differentiate with their proprietary addons or get acquired soon. There is no place for 3rd or 4th best.

  3. I’m not sure I agree. Cloudera does provide some value by building RPMs and Debian packages. They are a services company, not a technology/product company, but that’s a legitimate business. We used CDH at our company for the last year. After the Hadoop Summit we tried MapR, and I’ll admit I can’t imagine going back to CDH (I’m not sure why the article mentions vendor lock-in, since it’s trivial to move your apps from CDH to MapR and vice versa). We just finished migrating our production from CDH to MapR (the M3 edition for now). Having full read/write NFS access is a game-changer for us. IMHO, the folks at MapR seem to be years ahead of anything else that’s out there.

  4. Jack Norris

    The fact that Hortonworks will focus on improving Apache Hadoop and provide services is good for the entire industry. This is not bad news for MapR. In an earlier article you pointed out that MapR understands it’s not just the sum of its parts — the project is, in fact, all about parts. We’ve worked hard to improve many of these parts and include these in a complete distribution for Apache Hadoop. These innovations provide tremendous customer value in the form of greater performance, ease of use and dependability. Our model is not that different from that of other commercial vendors that provide a distribution. Although instead of focusing solely on management tools that are “proprietary”, MapR has provided many more underlying innovations. Providing these innovations does not automatically create vendor lock-in. In fact, one of our innovations is the ability to mount a Hadoop cluster via NFS. We’ve had customers tell us specifically that it makes it far easier to get data into and out of MapR clusters. Customers are looking for innovations with a focus on API compatibility so that MapReduce programs run without any required changes.

    Jack Norris VP Marketing, MapR Technologies