7 Comments

Summary:

Matt Howard of Norwest Venture Partners predicts that 2012 and 2013 will be Hadoop’s breakout years. Howard gives us insight into the five factors that will accelerate Hadoop’s mainstream adoption over the next 18 months.

hadoop

One of the things I love most about the software industry is the way new technologies can materialize from unlikely places and get applied in unexpected ways. Hadoop is a great example of this. Conceived by the open source community, Google, Yahoo and others, this programming framework has emerged as a promising solution to the big data problem.

I expect Hadoop to become enterprise-ready within the next 18 months. Encouraged by the arrival of innovative Hadoop vendors, many Fortune 500 companies — including eBay, Bank of America and JP Morgan — are experimenting with Hadoop deployments. As a technologist and an investor in this sector (Norwest Venture Partners, where I am a general partner, is an investor in Hadapt), I believe these investigations are quickly evolving into serious roll-outs. The following five key factors will accelerate mainstream adoption, making 2012 and 2013 Hadoop’s breakout years.

  •  1.  SQL provides a “fast pass” to Hadoop

The first hurdle Hadoop must clear is the stigma of its origins. As a product of the open source community, Hadoop and its countless siblings are regarded by traditional IT shops with confusion, suspicion, or even abject terror. Whatever their potential, these revolutionary interlopers threaten huge investments in expensive applications and proprietary technologies.

An SQL interface can help bridge the gap between the future, current and legacy technologies. Organizations are already purchasing Hadoop tools that offer various levels of SQL compatibility. We expect Hadoop to acquire deeper and deeper SQL support — and Hive, an open source SQL interface for Hadoop, is a good start.

In the next 18 months, I think we will see large retailers, financial services, Wall Street and the government using this “fast pass” SQL option to initiate much broader Hadoop deployments.

  • 2.  Hadoop performance gets a big boost

One of the leading reasons to use Hadoop is its extreme scalability. To date, that scalability has often come with significant performance penalties, including MapReduce query overhead and a storage layer that requires broad scans across file systems. If big data can’t produce information on demand, then it’s just an albatross.

Fortunately, the entire Hadoop industry — including a rapidly proliferating group of startups (Cloudera, Hadapt, Hortonworks, MapR), the amazingly innovative open source community, and such established vendors as IBM — are aggressively tackling these performance issues. The forthcoming Hadoop v0.23 and subsequent releases will include performance-boosting enhancements, including basic file system performance, minimum MapReduce job latency, and higher-level query interface (e.g. Hive, Apache Pig) performance.

  • 3.  Hadoop becomes increasingly reliable

To avoid having a single point of failure, Hadoop needs to address topology and deployment concerns left over from its initial incarnation. Hadoop employs a master node to keep track of data and to determine how to access it. If this “brain” goes down, everything could be at risk without the correct topology and redundancy. Over time, the Hadoop community will make improvements in this area. Cloudera, Hortonworks, MapR and other commercial vendors are already addressing this.

  •  4.  Mainstream case studies emerge

Hadoop is a grassroots phenomenon that emerged in the social networking and consumer Internet world. As always, there are early adopters who take risks on the cutting edge, and there are more conservative organizations watching the pioneers from the sidelines.

This played out in 2011 as early customer experiences with Hadoop were shared via conferences, online forums and vendor white papers. Experts think Hadoop is on the edge of a tipping point, as some of the earliest Hadoop implementers move from experiments to adoption. As a result, people implementing Hadoop today are benefiting from the lessons learned by the early pioneers.

In 2012 and 2013, we will see a growing body of case studies and the emergence of best practices as Hadoop technology matures and gets deployed in traditional enterprise environments. In short, Hadoop’s momentum will grow exponentially in the next 18 months.

If becoming mainstream is step four in the technology adoption process, Hadoop will move through step two and into step three this year and next.

  • 5.  The architecture evolves

Hadoop applications process vast amounts of data in parallel across many computers, relying upon MapReduce as the enabling distribution framework. Currently, Hadoop tightly couples distributed resource management and a single distributed programming paradigm (MapReduce) into one package. The Hadoop community is now decoupling the two functions. Separating these will provide more control over the different system functions and free up query processing.

Future releases of Hadoop will have an enhanced MapReduce framework and will feature a growing array of alternative distributed computing paradigms. Likely candidates include Message Passing Interface (MPI), distributed shell systems, OpenDremel and Bulk Synchronous Parallel (BSP). With these additional programming and distribution options, Hadoop will be able to support an even greater variety of workloads.

Hadoop is here to stay

Over the next few years, Hadoop will become a common component of the standard IT tool belt. To meet this demand, vendors are starting to package Hadoop into commercial off-the-shelf software (COTS).

Hadoop adoption will build on itself as organizations augment Hadoop solutions and grow ecosystems around them. Before our very eyes, Hadoop is becoming a platform.

Matt Howard is a general partner at Norwest Venture Partners (NVP), where he invests in mobile and wireless, big data, security, rich media, networking and storage sectors. He currently serves on the boards of Avere Systems, Blue Jeans Network, ConteXtream, Hadapt, MobileIron, Retrevo and Summit Microelectronics. He blogs at NVP Blog.

  1. Good article on new directions in Hadoop. Interesting to see a number of the ideas in my 2010 New York Times article “Beyond Hadoop: Next-Generation Big Data Architectures”, which were perceived by some in the Hadoop space as anti-MapReduce heresy at the time, have now becoming orthodoxy in the Hadoop ecosystem just 18 months later: BSP, MPI, Dremel, Pregel/Giraph,…

    http://nyti.ms/9JnDlS

    If you need/want to run BSP apps on HDFS or MapR today, there’s Cloudscale BSP

    http://cloudscale.com/index.php/technology/cloudscale-bsp

    and if you want to keep up with what’s going on in the Hadoop+BSP ecosystem you can follow the conversation on Quora

    http://www.quora.com/BSP

    http://www.quora.com/BSP_Computing

    Share
  2. This is well deserved. Open source leading the way once again.

    Share
  3. Just wondering when we’re going to start seeing articles that actually talk about real customers actually getting value out of Hadoop rather than the ones who are considering trying to get some value out of it. I’m just looking for some success stories. “Company XYZ completely revamps retail decision making through Hadoop,” or something of the sort.

    Share
    1. Derrick Harris Monday, March 5, 2012

      I think those stories are out there, but Hadoop isn’t the headline as much as is the use case itself. I would imagine Hadoop was somewhat behind’s Target’s infamous pregnancy fiasco lately, for example, whether or not it’s ever mentioned by name. Some don’t large companies don’t publicly acknowledge how they’re using it, though.

      Off the top of my head, I recall recently discussing Orbitz, eBay and Etsy as Hadoop users. And there are many applications and services that use Hadoop as part of the decisionmaking engine.

      Share
  4. Matt Howard Sunday, March 4, 2012

    Clint, I think we will starting seeing more articles in mainstream channels regarding case studies. I’ve been attending some of the Hadoop related conferences and I have been seeing more and more customer case study sessions. I take this as a hopeful sign that this is a leading indicator to address your important point. Here in Silicon Valley, I know of many early-to-late stage and public companies with Hadoop in their production environments. These clusters are truly mission critical. I know of a few companies where they want to keep the fact they are using Hadoop a secret because of the value creation. BTW, the Hadoop focused conferences have been selling out rather quickly these days… another positive sign as customers are sharing and learning. I do think you are making a great point and hopefully it will be fully addressed soon.

    Share
  5. Mike | HomelessOnWheels Monday, March 5, 2012

    Wow – this article reads like an exercise in keyword density. I don’t think I’ve ever seen the word “Hadoop” repeated so many times on a single page, including the project’s own homepage (hadoop.apache.org). Admittedly, it’s a kinda fun and silly word, but still — chill a bit.

    Share
  6. Not sure I agree. Hadoop has a lot of promise, but the MapReduce architecture is very limiting from the perspective that every data task has to be defined in terms of Map and Reduce steps. As we recall Google made MapReduce famous but have since moved on to use column oriented architectures (Big Table) to satisfy their requirement. If I were to bet on a technology right now, i would bet on the column oriented architectures or even something like what HPCC Systems offers – A implicitly parallel data flow oriented architecture. But if you are a Hadooper, what might work is using Hadoop with HBase, Flume etc. Alternatively, you can use HPCC if you want to avoid the complexity and use a single stack that does it all for you

    Share

Comments have been disabled for this post