Cloud computing is going to absorb your big data workloads, too

cloud data devices

Go ahead and deploy your Hadoop cluster in the cloud. Really. It can handle it.

Cloud computing providers and big data vendors have been working toward this moment for years, and it looks like the moment has finally come. Of all the news coming out of the Strata and Hadoop World shows taking place this week, the most compelling stuff all goes to prove this point. Here’s a quick recap of what was announced:

How a hybrid Hortonworks architecture might look. Source: Microsoft

How a hybrid Hortonworks architecture might look. Source: Microsoft

That’s just product news, about just Hadoop, from just this week. The broader goings on around the data community paint an even clearer picture of where we’re headed. All around, big, well-funded companies are putting some serious effort into blurring the lines between big data platforms and cloud computing platforms:

Salesforce.com's Wave on an iPhone.

Salesforce.com’s Wave on an iPhone.

And then there are the numerous startups doing some flavor of Hadoop in the cloud — Mortar, Altiscale and Qubole, to name a few — and seemingly dozens of analytics startups, some of which are running some very impressive infrastructure under a sleek UI.

AWS and Google regularly release new big data services for their cloud platforms, and we’ll likely see at least one more come out of Amazon’s Re:invent show next month. By the way, every cloud provider now offers solid-state drives, and they’re the default local and persistent storage options on AWS. That’s part of the reason Spark was able to run so fast in that Databricks benchmark test.

Don’t get me wrong, we are by most accounts very far from the point where most (broadly defined) big data workloads, or probably even a significant fraction of them, are running in a cloud service. For every Netflix running large Hadoop and Cassandra clusters in the cloud, there are probably two large banks that are still experimenting with three different Hadoop sofware vendors. Startups such as Interana and Cask (nee Continuuity) that want to target enterprise customers still face the harsh reality that their initial cloud delivery models will have to wait.

But with a few exceptions for particularly large or particularly regulated datasets, the tide is turning. If users weren’t asking for it, it’s hard to see all these companies trying so hard to make it happen.

You're subscribed! If you like, you can update your settings

loading

Comments have been disabled for this post