5 Comments

Summary:

Hadoop is becoming a popular choice for large organizations needing to store and process large volumes of unstructured data, but is it merely the flavor of the day? An eBay exec recently questioned his continued use of the platform if the pace of development doesn’t improve.

elephant_rgb

Hadoop is becoming a popular choice for large organizations that need to store and process large volumes of unstructured data. But will they abandon it if something better comes along?

According to an article on website computing, eBay Senior Director of E-commerce Darren Bruntz told an audience at this week’s Teradata conference that he wants to see “more focus and energy” from Hadoop’s open-source development community or else eBay might abandon its use of the big data platform in the coming years. That’s one of the strongest rebukes of the Hadoop community that I’ve seen, but the community appears poised to deliver on Bruntz’s challenge.

Bruntz described eBay’s three-platform big data environment that includes two separate Teradata data warehouse systems, as well as a Hadoop cluster. Although he expects this system to be in place for a few more years, Bruntz said that in the future, “we could perhaps move to a single platform” if something were to emerge that meets eBay’s needs.

That’s a meaningful statement because it comes from a huge Hadoop user — eBay’s Hadoop cluster is currently well over a dozen petabytes — but it’s not an entirely new sentiment. Forrester’s James Kobelius has predicted that Hadoop will be “the nucleus” of next-generation enterprise data warehouses, which presumably is the type of single-platform system that Bruntz wants, but Kobelius thinks that’s still three to five years out. Database analyst Curt Monash isn’t so sure about Hadoop as the foundation of data warehouses, but he did offer this assessment of Hadoop’s future:

  • Hadoop (as opposed to general MapReduce) has too much momentum to fizzle, perhaps unless it is supplanted by one or more embrace-and-extend MapReduce-plus systems that do a lot more than it does.
  • The way for Hadoop to avoid being a MapReduce afterthought is to evolve sufficiently quickly itself; ponderous standardization efforts are quite beside the point.

Teradata, eBay’s data warehouse vendor, recently acquired Aster Data Systems, which brings with it a MapReduce engine that’s not associated with Hadoop. To whatever degree it’s technically feasible, Teradata could conceivably deliver a single big data platform at some point. Alternatively, there is new big data entrant, HPCC Systems, which is pushing a Hadoop alternative that already does more than Hadoop, and it could develop its technology further to address even more use cases.

However, if you read Cloudera CEO Mike Olson’s great blog post earlier this week, you know that the Hadoop community is already working hard on evolving Hadoop beyond its roots. Buried in an assessment of whether Yahoo’s (and Hortonworks’) claims to Hadoop dominance are justified, Olson pointed out the following:

In the early days, if you wanted to use Hadoop, you loaded data into the system by hand and coded up Java routines to run in the MapReduce framework. The broad community recognized these problems and invented new projects to address them — Apache Hive and Apache Pig for queries, Apache Flume and Apache Sqoop (both incubating) for data loading, Apache HBase for high-performance record storage and more. … That ecosystem has exploded in recent years, and most of the innovation around Hadoop is now happening in new projects. That’s not surprising — as Hadoop has matured, the core platform has stabilized, and the community has concentrated on easing adoption and simplifying use.

But even if the Apache Hadoop community can’t sufficiently address the needs of companies like eBay, the greater Hadoop ecosystem likely will. There are companies such as Hadapt actually trying to make Hadoop the core of a data warehouse, and vendors such as Oracle, EMC and IBM are all working to align Hadoop more tightly with their data warehouse and analytic database products. There’s also MapR, which is pushing a Hadoop distribution (that EMC bases its Enterprise Edition on) that it says is more technologically advanced and better suited for business users, and, of course, Cloudera.

As different as their focuses might be, the one commonality is that all of these vendors rely on Apache Hadoop for technology and/or customer base. They will either drive the Apache project to address their needs and incorporate their innovations or they will integrate their own tweaked versions of Apache Hadoop into their products. Other approaches will surely exist, and some probably will thrive, but it doesn’t look like Hadoop is going anywhere.

  1. It is strange that such a statement comes from the executive whose company has immensely benefited from the platform. Maybe he can coax his management to have dedicated folks from eBay enhance the platform

    Share
    1. eBay is enhancing the platform. Perhaps not near the scale of others but it is certainly paying people to improve the system:
      http://www.hortonworks.com/reality-check-contributions-to-apache-hadoop/

      Share
  2. All software projects are “Flavor of the Month”. Without ongoing work to push the software forward, it will lose momentum. When a piece of software is surpassed by alternatives, it becomes yesterday’s news.

    A proprietary distribution of an open source project with enhanced features does not add any value to the open source project. Once the codebase goes proprietary, it loses most of the value it had from being open source. All that really remains is that it might still receive updates/bug fixes, provided they don’t overlap with proprietary components.

    It’s great news that plenty of 3rd parties are innovating on the shoulders of an open source project, but the open source project isn’t really benefiting from it.

    Claiming that the project will go on based on 3rd parties involvement in their own separate proprietary versions is laughable. When code contributions to the open source project dry up, the project is as good as dead.

    Share
  3. I’ve been looking at HPCC and like the consolidated architecture of Thor. I plan to attend one of the upcoming meetups that shows a comparison of HPCC & Hadoop to get a deeper view.
    Check them out at: http://www.meetup.com/Big-Data-Processing-and-Analytics-LexisNexis-HPCC-Systems/

    Share
  4. I’ve been looking at HPCC and like the consolidated architecture of Thor. I plan to attend one of the upcoming meetups that shows a comparison of HPCC & Hadoop to get a deeper view.
    See: http://www.meetup.com/Big-Data-Processing-and-Analytics-LexisNexis-HPCC-Systems/

    Share

Comments have been disabled for this post