3 Comments

Summary:

It didn’t take long for the Hadoop market to become a juggernaut, and it won’t take long for it to undergo some significant technological changes. Cloudera co-founder and chief strategy officer Mike Olson came on the Structure Show podcast to break it down.

It wasn’t too long ago that Hadoop was a shiny new technology — familiar to large web companies but foreign (and fascinating) to everyone else. Things changed fast and Hadoop is now a billion-dollar IT market underpinning big data efforts by companies of all stripes. Mike Olson, co-founder and chief strategy officer (and former CEO) of Cloudera, came on the Structure Show podcast this week to tell us where Hadoop is now and where it’s headed.

Here are the highlights of that interview, but anyone interested in Hadoop — especially in how the underlying technologies will evolve — should listen to the whole thing. Hadoop market watchers will also want to attend our Structure Data conference next month in New York, which will feature interviews with three important CEOs: Tom Reilly of Cloudera, Rob Bearden of Hortonworks and Paul Maritz of Pivotal. Big data applications are advancing fast, and these execs will explain how their companies plan to keep up and win in the market as a result.

Download This Episode

Subscribe in iTunes

The Structure Show RSS Feed

Big data is no place for the weak

“If we had to identify the single defining characteristic of the [Hadoop] market this year and going forward, it’s that shift in the competitive dynamic,” Olson explained. “It’s no longer a band of hearty, wild-eyed visionaries, venture-backed companies battling for market share with one another, but really the entrance of large and well-capitalized companies with very large installed bases and very good field relations with those guys who are going to shape how we — Cloudera — does business and really are going to shape how the market develops over the coming seven years.”

The big companies he’s talking about: IBM, Microsoft, Pivotal (which spun out of EMC and VMware) and Oracle, among others.

There is no love lost among Hadoop vendors

Cloudera touts its product lineup as an “enterprise data hub” in order to distinguish it from competitive offerings from vendors such as Hortonworks and MapR. Here was Olson’s response when asked why Cloudera considers itself so different from those companies, which are also making advances around security, search and other capabilities:

“We deliver a production product to market today. And we are proud to say that we have been the first vendor to bring that stuff to market consistently. So lots of announcements happen, because I think our vision is right and is widely recognized as right. But the question is ‘Who’s driving the platform forward and who is making that innovation available to customers first?’ Yeah, other vendors are going to announce future availability of products, not yet in GA, and while they do that we continue to innovate in our own way.”

It’s worth noting that MapR has been pretty aggressive itself about adding new features, many of which are shipping. And Hortonworks, although somewhat more measured on the innovation front, is a close partner with many of the big companies (including Microsoft and Red Hat) that Olson says will help reshape the market. Also, Hadoop is a highly competitive space and the companies involved are quick to criticize their peers.

Structure Data 2012: Michael Olson – CEO, Cloudera

Mike Olson at Structure Data 2012. (c) Pinar Ozger<br />(c) 2012 Pinar Ozger pinar@pinarozger.com

At least part of the database market is safe

“[T]here’s really no answer in the Hadoop space today for the kind OLTP and very advanced workloads that run on the traditional larger databases,” Olson acknowledged, adding, “I don’t think that any of those vendors looks at the platform, Hadoop, or even the more-capable enterprise data hub that we bring to market right now, with that much trepidation.”

However, he said, it might be a different case for companies that make their money selling analytics software, especially as technologies such as Cloudera’s Impala and other SQL-on-Hadoop offerings mature: “[A]s we’ve driven real-time capabilities into the platform…some of the more traditional analytic database workloads get to move over pretty easily right now….I think that trend is going to continue, and that the query language language that the platform supports is going to get more interesting over time, and that more workload optimization is in front of us.”

Hadoop is coming to the mid-market

Olson said it’s true that most Hadoop adoption today is coming from technologically savvy web companies and large enterprises making big investments. But, he added, “That’s going to get better…it really is. As the platform matures and as the tools and applications that run on top of it get easier to use and more diverse, you’re not going to need to be a data scientist anymore to buy and use this platform. You’re going to just be able to get a shrink-wrapped application that solves your problem.

That’s “exactly what happened with relational databases back in the day. When nobody used SQL and there were no apps, man, you needed to be a genius,” he continued. “But that changed in a pretty fundamental way, and we expect that will happen in this market.”

He noted, though, that Cloudera sees mid-market customers benefiting more from that ecosystem of technology vendors and applications than from cloud-hosted Hadoop — at least if it’s the one providing it. “We started the business with a hosted offering. We discovered that our customers loved us to run their Hadoop, but in addition they wanted us to run their other data infrastructure,” Olson said. “…It turns out we’re really good at running the new scale-out Hadoop-based platform, but that other stuff — not our wheelhouse.”

Cloudera's Jeff Hammerbacher talking about making data science easier at Structure Data 2013. (c) Albert Chau

Cloudera’s Jeff Hammerbacher talking about making data science easier at Structure Data 2013. (c) Albert Chau itsmebert.com

MapReduce will fade away as innovation flourishes

“I do believe that in time, the original implementation — disk-based, batch-mode MapReduce — will diminish in importance,” Olson said. “It’ll probably never go away, because there’s a bunch of installed base running on that and you don’t get to even retire your mainframes 50 years later, but if you think about where future workloads are going to be built, we think Spark is super interesting.”

Spark is a faster, easier, more efficient processing framework developed at the University of California, Berkeley, and currently being commercialized by a startup called Databricks. But it’s far from the only innovative thing happening in the big data world. Olson said his job is to keep an eye on what’s happening and to use good curatorial judgment in deciding which pieces to bring into the Cloudera platform and when.

“I believe that the most interesting data management work happening on the planet right now is happening in the consumer internet, in general, and at Google in particular,” he said. “We watch very carefully what is happening at the big scale-out web properties as basically a prediction of what more traditional enterprises are going to want in the future….This has been for the first 25 years of my career a very lucrative career, but a pretty dull one. I would not claim that boredom is a problem for me today.”

  1. JEAN-MICHEL LETENNIER Thursday, February 20, 2014

    Hadoop will and is failing despite all the hype..

    Please excuse my intrusion and allow me to introduce myself , my name is Jean Michel LeTennier and I am the Chief technology officer for http://www.virtue-desk.com and as such the de-facto partner for AtomicDB software in the United States.

    The FUTURE of Database technology & Data warehousing is Efficiency & Speed not size

    http://www.youtube.com/watch?v=DeExbclijPg (last 15 min is live demo)

    http://www.atomic-db.org

    Imagine being able to combine any sources of data without a line of code

    Imagine being able to do anything your mind can imagine, well now it can..
    The 6th Normal Form is not a TERM… it is a GOAL.. The “holy Grail” if you will of data management and more importantly “Information” management… Imagine being able to Store IDEAS as opposed to disconnected bits of data…

    The TRUE definition of 6th Normal form is OBJECT Database.. Where each and every piece of information/data is ATOMIC in nature and can be associated with any other piece of data/information, and thus NO restrictions… NO constraints… NO tables, NO Rows, NO VIEWS, NO CUBES… the correct term is “ASSOCIATIVE DATABASE” or Information system as it is in 3 dimensions by default… and technically (N) dimensions

    the advantages are thus:
    1. 100x SQL/ROW/TABLE speed
    2. 1/3 disk space
    3. 1 EXABYTE capacity – Single instance storage.. (No piece of data is ever duplicated)
    4. Security – Un-hackable – there is nothing to hack into
    5. NO QUERIES – we use filtering
    6. NO TABLES – thus no indexing to worry about
    7. Automated data aggregation – as many sources as required..

    And that is just the start… ;-)

    Mathematically speaking.. there is not enough data in the world to outscale a 64 bit associative system (using single instance storage). if you can imagine unique object count at 64 bit integer x 64 bit integer, x 64 bit integer, x 64 bit integer.. and only 4 READS to any object or group of objects..

    let me know if you are interested in working with it.. ? As a scientist. I think you would find it fascinating.. Basically anyone can now be taught to build data warehouses. ;-)

    ****************************************************
    AS Per Dr Victor TANG – MIT ( Head of IBM RESEARCH for 31 years) “We need to argue the point that the “earth is not flat” and that there is more than “growing peas” in Mendel’s experiments. I am convinced that we are at an infection point, And we need to argue that point with solid scholarship. What is needed is some thought leadership that has the quality of a Harvard Business Review article. A paper that is convincing and credible. We need something that will open people’s mental aperture. Presently, I am doing that.
    I am writing a paper challenging the “myopicity” of Big Data and arguing that there is something bigger. I think that your company and others I know will be visible in the paper.To test these ideas, I am planning to present at an International conference in Spain during the summer”

    ************************************
    If you would like more information please let me know

    We are particularly interested in any failed, behind schedule or over budget or simply un accomplished data projects.. . We can do what no else can…
    JM

    Reply Share
  2. The video link is not working, can someone please update the URL to download the video.

    Reply Share
  3. Sap Trainings Saturday, March 1, 2014

    Hello Folks,

    We are going to start online classes for HADOOP from next week.

    Interested people can contact: admin@saptrainings.com.

    Reply Share