Updated:One year after launching into the Hadoop market with much anticipation, Yahoo spinoff Hortonworks finally has a product available. The company announced version 1.0 of its flagship Hortonworks Data Platform on Tuesday, as well as a High Availability
version architecture designed with new partner VMware. Reasonable minds can disagree on whose distribution of the Apache Hadoop data-processing platform is the best, but Hortonworks needed to get on the board to be part of the discussion.
In terms of product, the Hortonworks Data Platform is about what was advertised when the company first unveiled it in November. The major difference from other commercial distributions, such as Cloudera, EMC Greenplum and MapR is that Hortonwork uses Apache Ambari to configure and manage clusters; HCatalog as a metadata service to connect with relational database products; and incorporates Talend’s Open Studio as a tool for graphically integrating datasets and composing workflows.
The Hortonworks Data Platform HA
distribution architecture, however, is a bit more intriguing. Technically, it works by running important Hadoop services such as NameNode and JobTracker and Oozie on virtual machines. If the physical server or VM on which a service is running fails, the product automatically moves the service to another box.
The other commercial Hadoop distributions all offer fault tolerance — at least for the Hadoop Distributed File System (which is where the NameNode resides) — but they rely on different approaches to get there. Cloudera, for example, is built on the new Hadoop 2.0 version (Hortonworks uses the tried-and-true Hadoop 1.0), while MapR uses a proprietary file system. Hortonworks, as is its business plan, will contribute the code for its HA
version solution back into the Apache Hadoop project.
“The next stage,” Hortonworks Chief Products Officer Ari Zilka told me, “is to run the whole cluster in a virtual environment.” Doing that without sacrificing processing performance will be the trick.
Where the real intrigue comes is that the Hortonworks Data Platform HA edition will be available through VMware. The work with VMware represents the furtherance of a unique partner strategy in which Hortonworks works closely with technology partners such as VMware, Microsoft and Teradata to develop products that leverage Hadoop while being more than mere integrations. Cloudera has more than 200 partners, for example, but at least some of Hortonworks’ partnerships appear much tighter.
Finally, at least, the market for Hadoop distributions appears complete. There are five rather distinct offerings from five rather distinct providers — Cloudera, EMC Greenplum, Hortonworks, IBM and MapR (six if you include Amazon’s Elastic MapReduce cloud service) — and each has its merits. We’ll see whose technologies and business models win the day as the enterprise world gets set to start investing in Hadoop in a major way.
Feature image courtesy of Shutterstock user Colin Edwards Photography.