Across a wide range of industries from health care and financial services to manufacturing and retail, companies are realizing the value of analyzing data with Hadoop. With access to a Hadoop cluster, organizations are able to collect, analyze, and act on data at a scale and price point that earlier data-analysis solutions typically cannot match.
While some have the skill, the will, and the need to build, operate, and maintain large Hadoop clusters of their own, a growing number of Hadoop’s prospective users are choosing not to make sustained investments in developing an in-house capability. An almost bewildering range of hosted solutions is now available to them, all described in some quarters as Hadoop as a Service (HaaS). These range from relatively simple cloud-based Hadoop offerings by Infrastructure-as-a-Service (IaaS) cloud providers including Amazon, Microsoft, and Rackspace through to highly customized solutions managed on an ongoing basis by service providers like CSC and CenturyLink. Startups such as Altiscale are completely focused on running Hadoop for their customers. As they do not need to worry about the impact on other applications, they are able to optimize hardware, software, and processes in order to get the best performance from Hadoop.
In this report we explore a number of the ways in which Hadoop can be deployed, and we discuss the choices to be made in selecting the best approach for meeting different sets of requirements.
Key findings include:
- Hadoop is designed to perform at scale, and large Hadoop clusters behave differently from the small groups of machines developers typically use to learn.
- There are a range of models for running a Hadoop cluster, from building in-house talent and infrastructure to adopting one of several Hadoop-as-a-Service solutions
- Competing HaaS products bring different costs and benefits, making it important to understand your requirements and their strengths and weaknesses. Some offer an environment in which a customer can run — and manage — Hadoop while others take responsibility for ensuring that Hadoop is available, maintained, patched, scaled, and actively monitored.
Feature image courtesy Flickr user Pattys-photos