6 Comments

Summary:

Hadoop creator and champion Yahoo is taking advantage of its annual Hadoop Summit today by rolling out some new features for its open-source Hadoop distribution. The new features tackle security and workflow management, which Yahoo hopes will help Hadoop continue its proliferation among mainstream users.

Yahoo is taking advantage of its annual Hadoop Summit today by rolling out some new features for the open-source file system distribution that it created for handling huge amounts of data. The new features tackle security and workflow management, two areas that Yahoo believes need to improve as Hadoop continues its proliferation among mainstream users. But will Yahoo’s features make it harder for startups like Cloudera and Karmasphere to earn a living?

On the security front, Yahoo has integrated the Kerberos authentication standard into its distribution, resulting in the aptly named Hadoop with Security. This lets users consolidate data from multiple applications onto the same Hadoop cluster, while limiting access to each class of data only to authorized users. This isn’t a mainstream problem yet, but because of its large Hadoop infrastructure –- 34,000 servers and 170 petabytes of data spread across the globe -– Shelton Shugar, SVP of cloud computing at Yahoo, thinks his company is “probably at the forefront of running into this [problem].” He adds that it will become a big issue for enterprises as their usage expands in scope beyond small development teams and single applications.

The other newly available download is a workflow-management tool called Oozie, which Shugar calls the “elephant tamer.” Oozie should be in high demand from users outside Yahoo because it lets them manage and maintain a variety of different Hadoop job types and data dependencies without writing their own applications to do so. Shugar says it’s the de facto tool for extract, transform, load, or ETL, processing at Yahoo.

Both of these Yahoo innovations beg the question of how the Hadoop market will play out. Cloudera offers its own commercial Hadoop distribution and support services, and plans to release proprietary products in the near future. Karmasphere offers a desktop-based product for building, deploying and managing Hadoop applications. Other startups, like Datameer, are incorporating Hadoop into the guts of business intelligence products without requiring the user to learn any Hadoop programming.

There currently is a market for value-added commercial products (GigaOM Pro sub req’d), for Hadoop, but one wonders whether first-time users are more likely to pay for Hadoop software or experiment with Yahoo’s growing set of free tools (which actually might end up in commercial distributions, too). Shugar says Yahoo is investing serious resources into balancing CPU and storage requirements to maximize infrastructure usage in the face of skyrocketing storage needs, and is also looking to improve internal programmer support to help get data in and out of Hadoop via metadata.

As more Yahoo software makes its way into the Apache Hadoop community, and big data analysis requirements grow, it might be difficult to justify paying for value-added solutions rather than just downloading the increasingly feature-packed Yahoo distribution and learning Hadoop development. Should the startups building their business around Hadoop worry?

Image courtesy of Flickr user Erik Eldridge

You’re subscribed! If you like, you can update your settings

  1. The Incredible, Growing, Commercial Hadoop Market Sunday, July 4, 2010

    [...] paying customers. This is a task made only more difficult by the fact that Hadoop creator Yahoo keeps releasing new open source tools and distributions (like it did this week with Hadoop with Security and Oozie) already proven to work at web [...]

  2. Hadoop Gets Commercial Cred as Cloudera and Netezza Connect Thursday, July 15, 2010

    [...] that Hadoop support is becoming a must-have, and it seems safe to say that Hadoop has finally made the journey from search engines to mainstream [...]

  3. Cloudera と Netezza による、Hadoop の商用アプライアンスとは? [ #cloud #cloudcomputing #hadoop #cloudera #cbajp ] « Agile Cat — Azure & Hadoop — Talking Book Thursday, July 15, 2010

    [...] that Hadoop support is becoming a must-have, and it seems safe to say that Hadoop has finally made the journey from search engines to mainstream [...]

  4. Hadoop World: Cloudera Makes More Big Data Friends: Cloud « Tuesday, October 12, 2010

    [...] cloud capabilities. For security, the latest CDH upgrade adds the Kerberos authentication standard recently integrated into Yahoo’s Hadoop with Security distribution. On the cloud front, Cloudera has integrated [...]

  5. Is Yahoo Set to Open-Source Real-Time MapReduce?: Cloud « Wednesday, November 3, 2010

    [...] among the community of MapReduce – particularly Hadoop – developers. Just as it has with certain tools developed for Yahoo’s Hadoop distribution, it seems likely Cloudera would incorporate S4 into its [...]

  6. Why Yahoo Is Discontinuing Its Hadoop Distribution: Cloud Computing News « Tuesday, February 1, 2011

    [...] community as well as its own distribution, the Cloudera Distribution for Apache Hadoop (CDH). I noted in late June that there might end being a schism between the various free, open source distributions as users [...]

Comments have been disabled for this post