11 Comments

Summary:

Rackspace is now doing Hadoop, Cloudera just announced a handful of partners — Hadoop is everywhere in the cloud these days. Here’s a quick breakdown of what cloud providers are offering which distributions of Hadoop as managed services.

With Rackspace granting early access to its Hadoop service on Sunday night, Cloudera announcing a handful of new cloud partners (Amazon Web Services, SoftLayer and Verizon, in addition to Savvis) on Monday and Microsoft making HDInsight on Windows Azure a reality, pretty much every infrastructure-as-a-service cloud around now offers a managed Hadoop service. Here’s a quick breakdown of who’s offering what.

Cloud provider Hadoop services/partners
Amazon Web Services Elastic MapReduce (Cloudera (forthcoming), MapR)
GoGrid GoGrid Big Data Solution (Cloudera)
Google Compute Engine MapR
Joyent Joyent Solution for Hadoop (Hortonworks)
Microsoft Windows Azure HDInsight (Hortonworks)
Rackspace Rackspace Cloud Big Data Platform (Hortonworks)
Savvis Savvis Big Data Solutions (Cloudera, MapR)
SoftLayer/IBM Cloudera
Verizon Enterprise Solutions Cloudera
Virtustream HANA-Hadoop Managed Service (Intel)

There are also a handful of independent Hadoop cloud services out there, either running atop AWS or hosted on their own infrastructure somewhere.

Independent Hadoop services
Altiscale
IBM (BigInsights)
Infochimps/CSC
Mortar Data
Qubole

One interesting thing about all the offerings in both camps is that they’re still very command line- and MapReduce-focused, with the highest level of abstraction generally being a simple programming language like Python. I’m still waiting for the day that GUI-based Hadoop services start popping up, trying to take some of the complexity of out creating Hadoop jobs, but maybe that day will never come. Or, maybe that’s already happening at an even higher level with all the data warehouse and other analytic services already out there running atop Hadoop and not included in these lists.

If I missed any hosted Hadoop services, please do note them in the comments.

Feature image courtesy of Shutterstock user FWStudio.

  1. Joydeep Sen Sarma Monday, October 28, 2013

    Derrick – Qubole has a lot of GUI in addition to APIs. May want to correct. Also Treasure Data is notably missing.

    Share
    1. Derrick Harris Monday, October 28, 2013

      Joydeep,

      Thanks for the comment. I probably should have been clearer that buy GUI I really meant a whole UX designed for the vaunted business user as opposed to already-skilled Hadoop user. Admittedly, it has been a while since I’ve seen Qubole in action, but I don’t recall that being the experience.

      Re: Treasure Data, I’ve covered it before, but it’s very much a data warehouse play if memory suits me. Correct?

      Share
      1. Joydeep Sen Sarma Monday, October 28, 2013

        fair enough. i also sort of realized that you probably meant something different after putting the comment in.

        treasure data is positioned as a data warehouse service – but it is definitely a hadoop/hive based data warehouse. so while it may not be a classic ‘hadoop as a service’ – but practically speaking it will compete with others on this list (and us) for the same set of customers.

        Share
  2. Savanna – is an OpenStack project bringing different Hadoop distributions to OpenStack cloud. It provdes a plugin for Horizon (OpenStack dashboard) wich allows to configure and launch various clusters, currently it is vanilla Apache Hadoop and Hortonworks Data Platform.
    It also allows to launch Hadoop jobs from UI. Currently it is basic mapreduce jobs (from jar files), Pig and Hive scripts.

    Links:
    https://wiki.openstack.org/wiki/Savanna
    http://docs.openstack.org/developer/savanna/

    Share
  3. Speaking of Hadoop UI, GreenButton supports Hadoop as a Service and exposes Ambari as well as Hue for creating/managing MR, Hive, Pig, Oozie etc. It also has a Job Designer for more complex workflows – it’s pretty nice! GreenButton is cloud-agnostic and can manage workloads across private and public clouds.
    http://www.greenbutton.com

    Share
  4. Hue is an open source GUI for Apache Hadoop with an interface for each component (HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, ZooKeeper…). It comes in packages, tarballs, github clones, VM… and can be setup in any cluster.

    Link: http://gethue.com

    Share
    1. It’s an elegant solution and certainly helps make Hadoop more accessible to the masses. Ambari (also open source) is a similarly elegant UI for cluster management/monitoring (exposes Ganglia telemetry in pretty graphs and Nagios alerts).

      Share
  5. Chris Gianelloni Monday, October 28, 2013

    Continuuity has Reactor, which is an API over Hadoop, which makes creating applications running inside YARN as easy as Java threads.

    Share
  6. Hi Derrick, Xplenty offers Hadoop as a Service on the cloud (supporting AWS and Softlayer), allowing data and BI users to utilize Hadoop without the need to learn a new skillset. Xplenty allows to provision Hadoop clusters with a single click as well as developing data flows w/o writing complex map reduce code.

    Share
  7. AltiScale (Founded by Raymie Strata)

    Share
  8. ”BigData” is a term that has been buzzing around a lot for the last few years. Here’s a detailed COMPARATIVE STUDY of all the players in the segment. http://goo.gl/Y2ApUe it certainly clears up the picture for HADOOP being all over the cloud.

    Share

Comments have been disabled for this post