Blog Post

Exclusive: Yahoo launching Hadoop spinoff this week

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Updated: Yahoo (s yhoo) will be spinning off a separate company focused on the development and commercialization of Apache Hadoop, called Hortonworks. The official announcement likely will come tomorrow or Wednesday to coincide with Yahoo’s annual Hadoop Summit, but rumors have been circulating for months and I confirmed the news today with a source familiar with the project.

As the originator of the Hadoop technology, Yahoo’s official entry into this space should play a big role in shaping how the market of Hadoop-based products evolves.

Yahoo’s Hortonworks (as in the Dr. Suess book “Horton Hears a Who,” a reference to the elephant logo that Apache Hadoop bears) will be comprised of a small team of Yahoo’s Hadoop engineers and will focus on developing a production-ready product based on the Apache Hadoop project, the set of open source tools designed for processing huge amounts of unstructured data in parallel. It’s a natural step for Yahoo, which uses Hadoop heavily within its own web operations, and which has contributed approximately 70 percent of the code to Apache Hadoop since the project’s inception.

By incorporating next-generation features and capabilities, Hortonworks hopes to make Hadoop easier to consume and better suited for running production workloads. Its products, which likely will include higher-level management tools on top of the core MapReduce and file system layers, will be open source and Hortonworks will try to maintain a close working relationship with Apache. The goal is to make HortonWorks the go-to vendor for a production-ready Hadoop distribution and support, but also to advance Yahoo’s repeated mission of making the official Apache Hadoop distribution the place to go for core software. Earlier this year, Yahoo discontinued its own Hadoop distribution, recommitting all that code and all its development efforts to Apache.

The introduction of Hortonworks means that other companies peddling Hadoop-based products can’t rest on their laurels. Cloudera, which pioneered commercial Hadoop, and EMC (s emc), which just launched its own set of Hadoop tools — a community version based on Facebook’s optimized Hadoop code, and an enterprise version leveraging MapR’s technology — are now on notice. Hortonworks differs from Cloudera because Hortonworks is more involved in software development, and the spinout’s tight alliance with Apache renders it distinct from the EMC products. Yet, Hortonworks will have to ensure it advances Hadoop development across industry lines and not just in a manner optimized for Yahoo’s webscale needs if it wants to gain adoption.

Despite all the talk about Hadoop, evidence suggests a presently paltry revenue base for the software Hortonworks, Cloudera and EMC peddle. Cloudera is leading the charge right now with what I’ve heard is a few million in annual revenue, but that’s hardly enough to sustain the amount of investment in Hadoop. Cloudera alone has raised $36 million, VCs have funded a number of other Hadoop-focused startups, and companies such as EMC and IBM (s ibm) are funding Hadoop strategies from their own coffers. Everyone with a stake in the outcome of Hadoop envisions a billion-dollar opportunity, so seeing how, or if, these companies are able to split the market and share revenue at least three ways makes this a fun race to watch. They also face increased competition from Hadoop alternatives such as LexisNexis spinoff HPCC Systems and Microsoft’s forthcoming Dryad tools.

Hortonworks will be a joint venture between Yahoo and an investor, presumably Benchmark Capital. The Wall Street Journal reported in May that Benchmark was in talks with Yahoo about how to handle launching the new company.

Update: Yahoo and Benchmark Capital officially launched Hortonworks on Tuesday afternoon. Eric Baldeschwieler, formerly VP of software engineering for the Hadoop team at Yahoo, will serve as CEO. NetApp is already on board as a Hortonworks ecosystem partner, supporting the distribution with its new NetApp Hadoop Open Storage System. Referred to internally as “Hadooplers,” HOSS centers around E-Series-based RAID configurations that “design dramatically improves the performance, scalability and predictability of congested Hadoop cluster networks by offloading most data ingest and object reconstruction (aka re-silvering) traffic.”

18 Responses to “Exclusive: Yahoo launching Hadoop spinoff this week”

  1. Rob's Slave

    The yahoo engineers have no idea what they’re getting themselves into with Rob Bearden in the company. Rob doesn’t think twice about firing people, he is *ruthless*. Rob is known to be a mercenary for VCs, his job is to come in, clean up house, then achieve the quickest buck possible via selling the company to an acquirer. He doesn’t build long-lasting companies, it is all about the quick flip for him. SpringSource was sold to VMware for $420M, they had a much higher potential than that. Similarly, JBoss was sold to RedHat for $350M.

  2. Anonymous Source

    Another rumor that I heard is the Benchmark/Yahoo deal was going to breakdown at the 11th because Eric14 threw a tantrum after Benchmark told him that Rob Bearden was going to be the CEO. Eric threatened to cause the whole deal to fall apart if he wasn’t the CEO, so Rob Bearden had to yield that title and instead went with President/COO. This further demonstrates E14’s selfishness, he was willing to sacrifice the whole deal breaking down for his ego.

  3. Eric14 Employee

    Eric14 had 100 employees in his Hadoop organization at Yahoo, of the 100 he only *picked* around 23 to join HortonWorks. I am among the employees who were not picked and I can tell you that we are all incredibly *demoralized* by this, we feel betrayed and abandoned.

    Hadoop was the only shining technology project within Yahoo for smart engineers who want to work on cool technology. Raymie Stata (Yahoo’s CTO) for many years said that this is one of the main reasons why Yahoo believed in the Hadoop open source strategy, to continue to attract smart engineers. But now, seeing that Eric14 *abandoned* us, and given that Hadoop isn’t here any more, I plan to leave within the next 4 weeks. I plan to join the EMC/Greenplum Hadoop team (their recruiters have been pinging me every day for last couple of months) and I pledge to give Eric14 a real run for his life, he betrayed us.

    Yahoo was incredibly stupid to just follow Eric’s whim on this. Rumor has it that he put a gun to Carol Bartz head, told management that they need to fund this new company or he will just take off and take all the smart engineers and leave. I am starting to agree with other commenters on this thread, Eric14 will throw anybody under the bus (even his own kids) to get ahead.

    • CustomerOfEric14

      I agree, and I bet that Eric14 will be replaced as CEO within 3 months, if not kicked out next week. I was an internal customer at Yahoo for the services that Eric14’s Hadoop team provided. They do *not* know how to treat customers, they treated us like shit and bossed us around. They thought that Hadoop is what the world revolved around and totally forgot that it is in service of Yahoo’s primary business. They kept bossing us around and telling us what to do instead of trying to solve our business needs. I am so glad they are out of Yahoo now in a vendor like relationship, because frankly if they don’t cleanup their act and start doing what we are asking for then we will stop paying them and take our business some where else. We are finally free from their tyranny and arrogance (the entire batch was super arrogant, not just Eric14). Yahoo’s core business is way more important than our ownership in this tiny business (our stock price actually kept dropping last few days, which shows that Yahoo’s ownership in Horton isn’t worth shit).

  4. Yahoo Alumn

    I worked at Yahoo for many years, and I can say without doubt that this company is doomed to failure if Eric Baldeschwieler (aka E14) is indeed the CEO. He is one of the most selfish/egocentric people I have ever met, all he cares about is himself, himself, then himself. He is also extremely emotional and throws temper tantrums all the time. I have no doubt that Benchmark will detect this rather quickly and replace him as CEO.

    • Derrick Harris

      Is he the rumored CEO? I didn’t ask, but he does seem like a logical choice given his current role. I don’t know Eric, though, so I can’t comment on your personality concerns.

      • Yahoo Employee

        Yes, he is the CEO, it is on management page of their website ( I concur with Yahoo Alumn since I actually reported directly to Eric14 and can testify that he is a ungrateful power hungry control freak who will burn anybody in his way to get ahead. His ego is so inflated even Hadoop can’t store it. Yahoo is so fortunate to have him out of there.

  5. hadooper

    It is a joke that Hadoop is opensource. Its big political platform for lot of folks. There is an ongoing fight to control Hadoop by various groups and they make it harder for any new guy to contribute to Hadoop.A guy in Yahoo tried controlling hadoop. Hadoop creator left yahoo because of this guy. For ego, this guy fought with Cloudera and now he is taking this fight to next level. I Know, Cloudera is winning because they have right people and huge commitment to Hadoop. Big mistake by yahoo to sponsor this ventures.

    • Derrick Harris

      Indeed, it does:

      Domain Administrator
      Yahoo! Inc.
      701 First Avenue
      Sunnyvale CA 94089

      Domain Name:

    • More Knowledgable Historical Dude

      Cutting was a Yahoo! Employee when he created it based on the 2004 Google paper, thus creating the IP on company time, thus making it Yahoo property at the time of creation. So, yeah, they did.

  6. This really should have been combined with Cloudera. Multiple commercial entities fighting over the same open source project is a bad idea for all involved. I said it before ( and I still believe neither company will do as well, and the user community would be better served by a single entity.

    — Max

    • Mixdev, thanks — we try to be very supportive of the Open Source community. But I’d note that we don’t brand S4 as competing with MapReduce. We note there are many approaches to hard problems. Just like Hadoop spent time incubating in our labs and then eventually moved into production, S4 is likewise a Yahoo!Labs project — and we hope to see good things come from it too over time. Maybe complimentary, maybe better suited for some use cases — but they are not apples to apples in application or maturity. They are both great examples of hard working Yahoo! engineering that is shared with the world to help made better platforms for the internet and for “big data” applications. Hope that helps clarify.

      Gil Yehuda, Director of Open Source at Yahoo! Inc.