It’s everywhere! The day Hadoop took over the cloud


With Rackspace(s rax) granting early access to its Hadoop service on Sunday night, Cloudera announcing a handful of new cloud partners (Amazon(s amzn) Web Services, SoftLayer(s ibm) and Verizon(s vz), in addition to Savvis(s ctl)) on Monday and Microsoft(s msft) making HDInsight on Windows Azure a reality, pretty much every infrastructure-as-a-service cloud around now offers a managed Hadoop service. Here’s a quick breakdown of who’s offering what.

Cloud provider Hadoop services/partners
Amazon Web Services Elastic MapReduce (Cloudera (forthcoming), MapR)
GoGrid GoGrid Big Data Solution (Cloudera)
Google Compute Engine MapR
Joyent Joyent Solution for Hadoop (Hortonworks)
Microsoft Windows Azure HDInsight (Hortonworks)
Rackspace Rackspace Cloud Big Data Platform (Hortonworks)
Savvis Savvis Big Data Solutions (Cloudera, MapR)
SoftLayer/IBM Cloudera
Verizon Enterprise Solutions Cloudera
Virtustream HANA-Hadoop Managed Service (Intel)

There are also a handful of independent Hadoop cloud services out there, either running atop AWS or hosted on their own infrastructure somewhere.

Independent Hadoop services
IBM (BigInsights)
Mortar Data

One interesting thing about all the offerings in both camps is that they’re still very command line- and MapReduce-focused, with the highest level of abstraction generally being a simple programming language like Python. I’m still waiting for the day that GUI-based Hadoop services start popping up, trying to take some of the complexity of out creating Hadoop jobs, but maybe that day will never come. Or, maybe that’s already happening at an even higher level with all the data warehouse and other analytic services already out there running atop Hadoop and not included in these lists.

If I missed any hosted Hadoop services, please do note them in the comments.

Feature image courtesy of Shutterstock user FWStudio.



”BigData” is a term that has been buzzing around a lot for the last few years. Here’s a detailed COMPARATIVE STUDY of all the players in the segment. it certainly clears up the picture for HADOOP being all over the cloud.

Yaniv Mor

Hi Derrick, Xplenty offers Hadoop as a Service on the cloud (supporting AWS and Softlayer), allowing data and BI users to utilize Hadoop without the need to learn a new skillset. Xplenty allows to provision Hadoop clusters with a single click as well as developing data flows w/o writing complex map reduce code.

Chris Gianelloni

Continuuity has Reactor, which is an API over Hadoop, which makes creating applications running inside YARN as easy as Java threads.


Hue is an open source GUI for Apache Hadoop with an interface for each component (HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, ZooKeeper…). It comes in packages, tarballs, github clones, VM… and can be setup in any cluster.


Dave Fellows

It’s an elegant solution and certainly helps make Hadoop more accessible to the masses. Ambari (also open source) is a similarly elegant UI for cluster management/monitoring (exposes Ganglia telemetry in pretty graphs and Nagios alerts).

Dave Fellows

Speaking of Hadoop UI, GreenButton supports Hadoop as a Service and exposes Ambari as well as Hue for creating/managing MR, Hive, Pig, Oozie etc. It also has a Job Designer for more complex workflows – it’s pretty nice! GreenButton is cloud-agnostic and can manage workloads across private and public clouds.


Savanna – is an OpenStack project bringing different Hadoop distributions to OpenStack cloud. It provdes a plugin for Horizon (OpenStack dashboard) wich allows to configure and launch various clusters, currently it is vanilla Apache Hadoop and Hortonworks Data Platform.
It also allows to launch Hadoop jobs from UI. Currently it is basic mapreduce jobs (from jar files), Pig and Hive scripts.


Joydeep Sen Sarma

Derrick – Qubole has a lot of GUI in addition to APIs. May want to correct. Also Treasure Data is notably missing.

Derrick Harris


Thanks for the comment. I probably should have been clearer that buy GUI I really meant a whole UX designed for the vaunted business user as opposed to already-skilled Hadoop user. Admittedly, it has been a while since I’ve seen Qubole in action, but I don’t recall that being the experience.

Re: Treasure Data, I’ve covered it before, but it’s very much a data warehouse play if memory suits me. Correct?

Joydeep Sen Sarma

fair enough. i also sort of realized that you probably meant something different after putting the comment in.

treasure data is positioned as a data warehouse service – but it is definitely a hadoop/hive based data warehouse. so while it may not be a classic ‘hadoop as a service’ – but practically speaking it will compete with others on this list (and us) for the same set of customers.

Comments are closed.