8 Comments

Summary:

The largest players in the Hadoop market are already raising money and sky-high valuations, employing hundreds of people and, in some cases, looking at nine-figure revenues. If you’re trying to get a sense of whether Hadoop is for real, these details might help.

There’s little hard data on the size of the largely private Hadoop market yet, but you can get a clue from looking at what’s going on inside Silicon Valley. The money changing hands and the sizes of the largest players in the space alone are enough to paint a telling picture of a market that’s growing fast in uncharted territory. I’ve collected some of the insights I’ve gleaned over the past few months to try and add some perspective.

Everything, of course, is relative and we might never see a Hadoop vendor reach the size of a database company such as Oracle with more than 100,000 employees and tens of billions in annual revenue. After all, Hadoop is a new technology for most companies, so it’s not really moving in on an already lucrative market and stealing budgetary dollars from incumbents. Further — and possibly more importantly — the core Hadoop technology is free and open source, meaning there are lots of unpaid downloads so money comes from services, support and large enterprises willing to buy software licenses for value-added products.

Money

Here’s a chart showing how much money Hadoop-based companies have raised thus far (although the grand total will likely rise by at least $10 million next week). Keep in mind, Cloudera only launched in 2009 and Hortonworks launched in June 2011. And these aren’t companies that merely bury Hadoop under an application or can connect their technologies to it — these are companies either selling Hadoop or applications designed specifically for it.


(To view the original, interactive chart, click here.)

In terms of revenue, one might look to a May 2012 report by research from IDC estimating the size of the Hadoop ecosystem to be around $77 million, growing to $813 million by 2016. Those are both impressive numbers, but they might actually be short-changing reality. For one, as I noted at the time, the authors attributed almost no revenue to Amazon Web Services’ Elastic MapReduce service, which is almost certainly generating at least a few million in revenue each year.

Speaking to me in June, Cloudera CEO Mike Olson also took issue with the number, claiming it didn’t even take Cloudera’s revenue into account — which seems entirely possible considering the business Cloudera is doing. I’ve heard from reliable sources that Cloudera is doing very well and is on track to do about $100 million in revenue this year, very possibly more. And as early as April 2011, Cloudera executives were touting that software license revenue had already surpassed services revenue (although it’s arguable whether that will, or even has to, remain the case).

More anecdotally, I’ve heard from several sources that Hortonworks has already declined at least one potentially appealing acquisition offer. That it wouldn’t sell isn’t surprising: sources say the company is valued at $225 million after its last round of funding and is looking to raise more money. And although it just released its first product in June, the company has impressive and potentially lucrative partnerships in place with Microsoft, Teradata, Rackspace, VMware and other large vendors.

MapR, the proprietary thorn in the sides of both Cloudera and Hortonworks, appears to be doing quite well, too. Vice President of Marketing Jack Norris told me in June that his company had higher license revenue than many would expect and predicted that deals with Amazon Web Services and Google Compute Engine would help the company become “the license revenue leader within the next quarter.”

Former Cloudera VP of Technology Solutions Omer Trajan, who just left to join HBase-centric startup WibiData, shared some insights with me from his days at Cloudera that seem to back up vendor confidence. He said most mature production clusters (excluding monster users such as Facebook) consist of about 200 nodes, and many double in size after the first year. That’s part of the reason Cloudera grew in size about 10x during the three years he was there.

“It has definitely been a rocket ship,” he said. “… You just strap in and hope you make it up.”

Interest is only picking up, too: “There are more people that have started big data projects in the past six months than have big big data projects running [in production],” Trajman said.

People

It’s probably not accurate to call companies such as Cloudera, Hortonworks and MapR startups anymore, and we might start to see signs of this shift in personnel moves. Here’s how big they are and expect to become:

  • Cloudera: More than 300 employees globally and growing, especially in the sales department.
  • Hortonworks: 145 employees as of late October and hiring a person per day, on average, through the end of 2012.
  • MapR: More than 125 employees, mostly in technical and engineering positions; starting to build sales team and looks to more than double headcount in 2013.

While Cloudera and Hortonworks, for example, are still young, nimble and agile enough to lure a fair amount of talent from now-officially large enterprises such as VMware, their employees who joined on early and really love the startup life might not stick around.

Trajman’s new home, WibiData, is a fine example of this. It was launched last year by former Cloudera employees Christophe Bisciglia (who actually co-founded Cloudera) and Aaron Kimball to help companies build behavioral-analysis applications on top of Hadoop.

(Maybe there’s a Cloudera mafia shaping up: WibiData’s officemates — MemCachier and Thanx — both count former Cloudera employees as key members or founders of their teams, as does HBase-centric startup Drawn to Scale.)

Trajman, who was one of the first couple dozen employees at Cloudera (and who previously joined Vertica at around the same stage in its growth) told me he likes the rush of getting in the the ground level of new technologies and helping companies do something really new. While he enjoyed establishing and implementing some the the core foundational use cases for Hadoop (e.g., ETL and data exploration) with Cloudera’s early customers, that’s still much of what Cloudera provides to customers because it’s so difficult to build higher-level and higher-value applications at the infrastructural level where Cloudera operates.

“For me, it was very personal in terms of the impact I wanted to have,” Trajman said. At WibiData, he can help users who have the infrastructure part resolved and now want to develop applications that make data analysis a core part of their businesses. Where there’s a focus on innovation, he said, that’s where the innovators go.

This isn’t a bad thing, it’s just a side effect of growth — and when employees stay and innovate in the Hadoop space, it just creates a bigger pie for everyone to share.

Feature image courtesy of Shutterstock user GuskovaNatalia.

  1. $100M in revenue for Cloudera? How can you arrive at that figure? I don’t know of many companies paying for licensees and even then it is less than $5k.

    Share
    1. Derrick Harris Friday, November 9, 2012

      I didn’t arrive at that figure, but it seems plausible. Cloudera Enterprise = 4K per node per year at face value. 100 customers averaging 200 nodes would be $80M. Then services, Cloudera U, etc.

      Share
  2. I’m impressed with Hadoop’s rise and the success of small companies creating solutions with open source software. More than anything else, Hadoop brought Big Data to the marketplace instead of leaving it locked up in enormous enterprises where insight could be monopolized.

    The challenge, however, is the insight gained by MapReduce through Hadoop is only as good as an organization’s ability to execute on what they discover. Analytics are moving more and more into process flows and will move away from batch jobs executed as research projects.

    This means that the infrastructure that sits under big data is what powers success, not this application. I’ve said that without great underpinnings, Hadoop is like an elephant riding a bicycle (pun intended). I wrote it up here in detail along with what I see as the key components to a BIg Data solution (beyond MapReduce):

    http://successfulworkplace.com/2012/10/28/big-data-must-not-be-an-elephant-riding-a-bicycle/

    I call it my Big Data Manifesto and I welcome comment.

    The Big Data ecosystem is right now primarily focused around the companies you mention like Cloudera. I expect to see that grow considerably as we build out the race car for the elephant.

    Share
  3. I too see Hadoop as potentially an elephant, but balanced on a ball (both brandishing the letters BD for big data). See it on our web page for Essence DB at http://www.txformative.com/#!essence-db/c7kw. We have an ambitious project to simplify big data Analytics and database tech along with revolutionary approaches to developing software. If we succeed, we will make a quantum leap forward in the effort to tame Big Data.

    Share
  4. Great story, thanks for this. Note that I’m experiencing problems with the figure ‘Venture capital investment in Hadoop companies’ (it doesn’t show up at all) but I was able to retrieve the original from http://public.tableausoftware.com/static/images/Ha/Hadoopfunding2/Sheet1/1_rss.png – might want to look into this? KUTGW!

    Share
    1. Derrick Harris Sunday, November 11, 2012

      Thanks for pointing out. It was a Tableau Public file, but replaced with a non-interactive image.

      Share
  5. Kenneth Chestnut Sunday, November 11, 2012

    In a market that will become commoditized over time, Cloudera is smart to try to differentiate via Impala and other initiatives. At some point, we will see significant consolidation as the market matures and is unable to support the growing number of companies in this space…

    Share
  6. why isn’t Revolution Analytics included in the funding total?

    Share

Comments have been disabled for this post