The unsexy side of big data: 5 tools to manage your Hadoop cluster

9 Comments

Before you can get into the fun part of actually processing and analyzing big data with Hadoop, you have to configure, deploy and manage your cluster. It’s neither easy nor glamorous — data scientists get all the love — but it is necessary. Here are five tools (not from commercial distribution providers such as Cloudera or MapR) to help you do it.

Apache Ambari

Apache Ambari is an open source project for monitoring, administration and lifecycle management for Hadoop. It’s also the project that Hortonworks has chosen as the management component for the Hortonworks Data Platform. Ambari works with Hadoop MapReduce, HDFS, HBase, Pig, Hive, HCatalog and Zookeeper.

Apache Mesos

Apache Mesos is a cluster manager that lets users run multiple Hadoop jobs, or other high-performance applications, on the same cluster at the same time. According to Twitter Open Source Manager Chris Aniszczyk, Mesos “runs on hundreds of production machines and makes it easier to execute jobs that do everything from running services to handling our analytics workload.”

Platform MapReduce

Platform MapReduce is high-performance computing expert Platform Computing’s (s ibm) entre into the big data space. It’s a runtime environment that supports a variety of MapReduce applications and file systems, not just those directly associated with Hadoop, and is tuned for enterprise-class performance and reliability. Platform, now part of IBM, built a respectable business managing clusters for large financial services institutions.

StackIQ Rocks+ Big Data

StackIQ Rock+ Big Data is a commercial distribution of the Rocks cluster management software that the company has beefed up to also support Apache Hadoop. Rocks+ supports the Apache, Cloudera, Hortonworks and MapR distributions, and handles the entire process from configuring bare metal servers to managing an operational Hadoop cluster.

Zettaset Orchestrator

Zettaset Orchestrator is an end-to-end Hadoop management product that supports multiple Hadoop distributions. Zettaset touts Orchestrator’s UI-based experience and its ability to handle what the company calls MAAPS — management, availability, automation, provisioning and security. At least one large company, Zions Bancorporation, is a Zettaset customer.

If there are more Hadoop management tools floating around, please let me know in the comments.

Feature image courtesy of Shutterstock user .shock.

9 Comments

HM

Hadoop is new and raw. It needs many pieces to make it a complete solution. The skills required to bring all these technologies together is mind boggling and creates a huge burden on IT. There are very few solutions out there that fit the bill. One promising technology (as an alternative to Hadoop) is the HPCC Systems platform, a completely integrated solution – ETL + Data Mining + Data Delivery. The people at LexisNexis have been using this platform for more than 10 years and have built a very successful business around it. So it seems to be battle tested and enterprise ready. For more visit hpccsystems.com

Jo Maitland

The name node issue no longer exists if you are running Apache
Hadoop 0.23.2 or higher. Fixing the name node single point of failure is no longer a differentiator when you can get it in the open source version.

Michael Shaler

DataStax Enterprise eliminates Hadoop name-node as single point of failure.

Prasun Sinha

Ankush, from Impetus Technologies, is another Hadoop Cluster Management tool.

Anon

Apache Ambari isn’t even ready for consumption, and you didn’t even bother checking out Cloudera Manager, which is the most widely used one out there.

Derrick Harris

Yes, but, as I noted, I didn’t include distro-specific tools. If you’re using CDH, I assume you’ve looked at Cloudera Manager, too. Same for MapR.

Anonymous

Not that your a bitter Cloudera employee or anything… right?

Comments are closed.