Cloudera: All Your Big Data Are Belong to Us


If the next industrial revolution is all about making sense of data, then Cloudera may well prove a primary driver of this Big Data movement as the foundation of the “LAMP stack” for Big Data. It’s still early days in this deep mining of corporate data assets, but as the primary sponsor of the Hadoop open-source project, Cloudera stands to profit. Handsomely.

The company, founded in 2008, has raised two rounds of venture capital funding, but the money coming from customers is more impressive. In just two years, Cloudera has signed over 50 customers, according to Cloudera CEO Mike Olson, and registers over 20,000 downloads each month of its Cloudera Distribution for Hadoop (CDH).

For those unfamiliar with the mechanics of how open-source businesses work, such downloads give Cloudera a fertile hunting ground for prospective customers: so fertile that the company has more than doubled sales each year since its launch in 2008. Cloudera’s customer list includes tech-heavy companies like Rackspace (s rax), Bank of America (s bac), and LinkedIn, but also reflects a mainstreaming of Big Data with the likes of University of Phoenix gracing the list.

Not that Cloudera can take all the credit. Hadoop is an Apache Software Foundation project and has attracted a diverse array of external contributors, including Twitter, Facebook, and Yahoo (s yhoo), among others. Cloudera is an important contributor, but it’s by no means the only one.

Where Cloudera shines, however, is in taking these different contributions and making Hadoop relevant for enterprise IT, where data mining has waxed and waned over the years. Part of the “waning” has come through the cost and complexity of the systems used to mine corporate data. Unfortunately, Hadoop and its ilk haven’t fixed that problem completely just yet, according to open-source veteran Zack Urlocker:

You pretty much gotta be near genius level to build systems on top of Cassandra, Hadoop and the like today. These are powerful tools, but very low-level, equivalent to programming client server applications in assembly language. When it works its [sic] great, but the effort is significant and it’s probably beyond the scope of mainstream IT organizations.

That’s the challenge, and it’s a big one.  Fortunately for Cloudera and its investors, the payback for overcoming that challenge is huge, and Cloudera seems to be well on its way toward achieving it. One possible hitch is that the easier Hadoop becomes to use, the less likely enterprises will be to pay Cloudera for a supported version of Hadoop. It’s therefore critical that Cloudera keeps its Cloudera Enterprise — a suite that includes a tailored distribution of Hadoop plus management and monitoring tools — ahead of the basic Hadoop offering. Companies like Facebook may not need such assistance, but mainstream IT shops likely will.

Cloudera, in other words, is banking on the complexity of Hadoop to drive enterprise IT to its own Cloudera Enteprise tools. It’s a good bet, as a similar strategy has paid off for Red Hat (s rht). So, while a quick scan of the agenda for the upcoming Hadoop World suggests a geeknerati, early-adopter crowd still dominates the discussion around Hadoop, Cloudera is working hard to change this.

Olson tells me he’s been surprised by how rapid the adoption of Hadoop has been within enterprise IT. While he wouldn’t provide details on ongoing customer negotiations, he made it clear that Cloudera is signing an increasing number of customers that one wouldn’t normally classify as early adopters.

Cloudera competitors that also provide support for Hadoop, like Karmasphere, are also making headway, but with the bulk of the core Hadoop contributors like Doug Cutting on Cloudera’s payroll, it’s no surprise that it’s leading the pack.

The consumer web increasingly demonstrates the power of unlocking the data behind social connections, a trend that has contributed to enterprise IT demanding the same depth of Big Data analysis. Cloudera still has a lot of work to do to lower barriers to Big Data adoption among the less technically literate set, but it’s on the right track and, as its growing customer list shows, that track is paying dividends.


Mike Olson


Thanks for the very good article. We think that the trend toward large, more complex data, subject to much more interesting and sophisticated analysis, is a huge deal. This is bigger than any single company; as others have noted in the comments, the ecosystem forming around Hadoop is growing quickly. Datameer, Karmasphere and others are important partners, not competitors.

Also, while I’m really proud of the job that Cloudera staff do in contributing to the project, we’re just one company active on the project. Yahoo!, Facebook and many others do outstanding work as well. Cloudera’s Distribution for Hadoop includes a suite of complementary open source projects, plus packaging, that make it easy to acquire, deploy and manage.

Owen O'Malley

Yahoo is actually contributes the majority of Hadoop. The last time I computed the totals, 70% of the patches for Hadoop came from Yahoo engineers. Cloudera does a great job in making the entire stack easier to install, training users, and helping companies get up and going on Hadoop and the related tools.

Teresa Wingfield

Hi Matt,

The Datameer team enjoyed your article, but believe there is a far greater ecosystem of Hadoop products and services than your article suggests. We also believe the article did not give due credit to Hadoop contributors at organizations such as Yahoo!, Facebook and others.

Teresa Wingfield
Vice President, Marketing

Martin Hall


Good article but I’d like to fix an inaccuracy.

A rich ecosystem of products and services is emerging around Hadoop. Companies aren’t necessarily competing just because they both ship products for Hadoop.

The case in point: Karmasphere does not compete with Cloudera. They do a terrific job providing the infrastructure and operator-focused software for installing, growing and managing Hadoop clusters. The other part of the story and the end-to-end stack is the front end applications that make working with big data easier for developers and analysts. That’s what we do at Karmasphere.


Comments are closed.