Weekly Update

Hadoop needs a better front-end for business users

Whether you’re running it on premises or in the cloud, Hadoop leaves a lot to be desired in the ease-of-use department. The Hadoop offerings on the three major cloud platforms (Amazon’s Elastic MapReduce  — EMR, Microsoft’s Azure HDInsight and Google Compute Engine’s Click-to-Deploy Hadoop) have their warts. And the three major on-premises distributions (Cloudera CDH, Hortonworks HDP and MapR) can be formidable adversaries to casual users as well.

See prompt

The root of Hadoop’s ease-of-use problem, no matter where you run it, is that it’s essentially a command line tool. In the enterprise, people are used to graphical user interfaces (GUIs), be they in desktop applications or in the Web browser, that make things fairly simple to select, configure, and run. To the highly technical people who were Hadoop’s early adopters, the minimalism of the command prompt has a greater purity and “honesty” than a GUI. But, while there’s no reason to demonize people who feel this way, command line tools just won’t fly with business users in the enterprise.

Amazon and Google seem to aim their services at well-initiated Hadoop jocks. And with that premise in place, their offerings are fine. But that premise isn’t a good one, frankly, if mainstream adoption is what these companies are looking for. Microsoft’s HDInsight does at least allow for simplified access to Hadoop data via a Hive GUI. This allows for entry of Hive queries (complete with syntax coloring), monitoring of job progress, and viewing of the query output. If you want to do more than that, you get to visit the magical land of PowerShell, Microsoft’s system scripting environment. Woo. Hoo.

Adjusting the Hue

Amazon now allows you to install “Hue,” an open source, browser-based GUI for Hadoop, on an EMR Hadoop cluster. Hue is probably the most evolved GUI offering out there on Hadoop. However, getting Hue working on EMR involves some security configuration in order to open up Web access to your cluster. And let’s just say that it’s far from straightforward.

Hue is available in the major on-premises Hadoop distributions as well. It provides front-ends for creating and running MapReduce jobs, working with Hive, Pig, HBase, the Solr search interface and more. But Hue’s Pig interface is really just a script editor with an run button, and its HBase interface is just a table browser. Whether on-site or in the cloud, Hadoop needs much more advanced tooling in order to get to groundswell status. Developers need to be able to connect to Hadoop from their integrated development environments (IDEs) and cloud control panels need to make the logistics of working with Hadoop far simpler.

“A” for effort

To its credit, Microsoft is making some important moves here. The preview release of its Visual Studio 2015 IDE includes both Hive integration into its Server Explorer tool, and “tooling to create Hive queries and submit them as jobs” according to a post on the Azure blog. It even now includes a browser for the Azure storage containers that back HDInsight clusters, and a special Hive project template.

Beyond that, the Web-based portal for HDInsight has a “Getting Started Gallery” section that provides task-based options for processing data and analyzing it in Excel. That is key: Hadoop will become truly mainstream only when people can use it as a means to an end. In order for that to happen, the tooling around it needs to be oriented to the workflows of business users (and data professionals), with the appropriate components invoked at the appropriate times. However, the HDInsight Getting Started Gallery items are more guided walkthroughs than they are truly automated tasks.

Hadoop at your service

Beyond the cloud providers’ own Hadoop products lie Hadoop-as-a-Service (yes, that would be HaaS) offerings from companies like Qubole and Altiscale, which let you work as if you had your own cluster at your beck and call. Qubole provides its own component-oriented GUI and API, and in turn deploys clusters for you on AWS, Google Compute Engine and, as of November 18th, Microsoft Azure as well. Altiscale runs its own cloud and provides its own workbench, which is a command line interface that users connect to via SSH.

HaaS offerings will likely grow in popularity. The incumbent cloud providers may need to adopt the as-a-service paradigm in their own Hadoop offerings, in terms of user interface and/or pricing models. Meanwhile, developer tools providers should integrate Hadoop into their IDEs as Microsoft has started to do. And everyone in the Hadoop industry needs to embed Hadoop components in their menu options and features, rather than just expose these components in a pass-through fashion.

Right now, Hadoop tools let you pick ingredients from the pantry, prepare them, and combine them on your own. In the end, with the right knowledge and skill, you do get a meal. But the tooling shouldn’t be just for chefs; it should be for diners too. Business users need to set up a project easily, configure its options, and then let the tools bring to bear the appropriate Hadoop components in the right combination. That’s how Hadoop will have a chance to make it as an enterprise main course.