2 Comments

Summary:

Hadoop World is taking place today, and, indicative of the general momentum around Hadoop, there is plenty of news coming from the event. As one should expect, Cloudera is driving the action, but it brings vendors and service providers of all stripes into the mix.

Updated: Hadoop World is taking place today in New York, and, indicative of the general momentum around the massively parallel analytics software, there’s plenty of news coming from the event. As one should expect, Cloudera is driving the action, but it brings vendors and service providers of all stripes into the mix.

Keeping with its strategy of the past several months, Cloudera added even more partners to its expansive stable. It now counts data-management vendor Talend and NoSQL-database startup Membase among its technology partners, and Japanese service provider NTT DATA has signed on as a Cloudera reseller in the country. Additionally, business-intelligence and data-integration peddler Pentaho released Hadoop integrations for both of its product lines, compatible with the Cloudera, Apache and Amazon Elastic MapReduce Hadoop distributions. Existing Cloudera partnerships also expanded — namely, partner Quest Software officially released the OraOop connector between Hadoop and Oracle Database, and Vertica Systems released its second-generation connector for the Cloudera Distribution for Hadoop (CDH).

Cloudera also upgraded CDH to incorporate additional security and cloud capabilities. For security, the latest CDH upgrade adds the Kerberos authentication standard recently integrated into Yahoo’s Hadoop with Security distribution. On the cloud front, Cloudera has integrated the Apache Whirr product, which according to Cloudera “is a tool for quickly starting and managing clusters running on cloud services like Amazon EC2.”

Cloudera’s spate of Hadoop World news is hardly surprising. When I spoke with CEO Mike Olson last week, he indicated that Cloudera is determined to become the de facto Hadoop platform: a strategy that means partners, partners and more partners. If someone is going to integrate Hadoop support into existing software, develop higher-level Hadoop tools or just deploy their own Hadoop cluster, Cloudera wants CDH to be part of the plan. Considering there are still dozens of database, BI and analytics vendors that have yet to announce a Hadoop strategy, not to mention potential customers, Cloudera has plenty of opportunity ahead of it.

This untapped potential is the reason Olson is so excited about Cloudera’s future (he cites the potential to “build a billion-dollar business”) and why he isn’t sweating to follow in recent partners’ footsteps by getting acquired in the short term.  “This opportunity is still too nascent and too potentially enormous for us to do anything other than focus on building the greatest business that we can possibly build,” he explained.

Startups Karmasphere and Datameer used Hadoop World to let the world know that their flagship products — Karmasphere Studio: Professional Edition and Datameer Analytics Solution — are generally available. The Karmasphere product is designed to ease the process of developing Hadoop workloads and applications, even from the desktop, while Datameer targets BI professionals with a spreadsheet-style Hadoop analysis tool. Unlike Cloudera, Karmasphere and Datameer aren’t yet household names in the Big Data community, but they could be on their way. A recent survey showed a steep learning curve as the No. 1 impediment to Hadoop development, and this is exactly the issue Karmasphere and Datameer address.

Image courtesy of Flickr user melaclaro.

Related Research about NoSQL Databases from GigaOM Pro:

  1. [...] GigaOM Hadoop World: Cloudera Makes More Big Data Friends [...]

    Share
  2. People should realize that there is no such thing as a free lunch. Period. Yes, Hadoop is free … But, to analyze Big Data with Hadoop, one need to build a cluster of dozens or even hundreds of servers. What is the TCO per year of 100 server cluster? Several hundred thousand dollars. I would say, $300,000-400,000 per year. And we are not talking about petabytes of data here, merely – dozens of terabytes. What is the Hadoop price per terabyte of data?
    Do not forget, to get comparable, with leading MPP products, performance, Hadoop needs at least 10-20 times more servers.
    May be commercial alternatives (MPP databases) are not so expensive when you take into account all expenses?

    Share

Comments have been disabled for this post