Beyond MapReduce: How the new Hadoop works

Table of Contents

  1. Summary
  2. What is Hadoop?
  3. Hadoop 2.0: beyond MapReduce
  4. Integrating Hadoop
  5. Where Hadoop goes next
  6. About Paul Miller

1. Summary

In only a few years Hadoop has graduated from a personal side project to become the poster child of the nascent multibillion dollar big-data industry. Leading providers of technical solutions based on Apache Hadoop attract large investments, and Hadoop-powered success stories continue to spread beyond the Silicon Valley giants in which these technologies were initially nurtured.

New features included in Hadoop’s latest releases go some way towards freeing an increasingly capable data platform from the constraints of its early dependence on one specific technical approach: MapReduce. Those same advances are also powering a new drive to embrace the complex and diverse enterprise workloads for which MapReduce was not necessarily the most appropriate data-processing tool, and where Hadoop’s early reputation for complexity and an apparent disregard for established enterprise processes around security, audit, and governance hindered adoption.

At the same time, the big-data landscape is becoming more complex. New tools like Apache Spark were quick to integrate with Hadoop but today also function increasingly well without it. Established enterprise IT firms co-opt the Hadoop name where they can while also pushing refreshes to their own tried and tested products.

In this report we explain what Hadoop is, how it has recently transformed, discuss what it’s good for, and consider how it might evolve as technology, expectations, requirements, and the broader competitive landscape alter around it.

Full content available to GigaOm Subscribers.

Sign Up For Free