Blog Post

Big Data Is on a Collision Course with the Cloud

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

It’s an interesting time to be involved with information technology, as we’re seeing two of the biggest trends in a long time ascend into the mainstream almost simultaneously. This morning, GigaOM Pro published my wrapup of the first-quarter news and trends in the infrastructure space (subscription required), and what struck me the most as I looked back was that “big data” has become the new “cloud computing.” I don’t mean that with regard to the technological aspects, but rather to the importance that vendors and customers alike have attached to the term. Just like every vendor now has a cloud product and every company has a cloud strategy in place, big data efforts also will become ubiquitous over the next couple years, and the two very well might merge in the near future.

That being said, the reasons for embracing big data might be entirely different than the reasons for embracing cloud computing. Cloud computing, at least for many users, is about offloading responsibilities that are a necessary cost of doing business, but that don’t necessarily do much to improve the business. Big data, on the other hand, is a different beast — at least for now. In many cases, it’s about looking at information in entirely new ways in order to improve whatever it is that company does. Whether they’re improving the effectiveness of advertising or actually inspiring new products, analytics efforts effect real business results.

But like cloud computing, businesses realize that if they don’t have a big data story to tell or a big data strategy in place, they’ll very soon fall behind the curve. Companies that haven’t at least implemented a private cloud infrastructure will still be wasting resources managing IT tasks while their competitors have automated them and are investing those resource elsewhere. Soon, companies without analytics systems in place will be grasping at straws, relatively speaking, to determine what it is that customers want, while their competitors will be drawing actionable insights from data that tells them what customers want. For proof, just look at the incredible amount of Hadoop, NoSQL, analytic database and business intelligence action over the past year, and the past three months, in particular.

That’s not to say that the two trends aren’t on convergent paths: I think they are, but just a result of cloud computing maturing, as big data is still in its relative infancy. With increasingly inexpensive cloud storage and increasingly powerful cloud processing, the cloud is becoming an ideal place to store and analyze the data that companies are collecting. For one, it’s a risk-free way to experiment with advanced analytics while not having to invest piles of cash in the infrastructure otherwise needed to run those types of workloads. But with the advent of big data workflows delivered as cloud services along with every other type of application, users no longer will even need to undertake, at least to the same degree, the sometimes laborious process of teaching themselves new software and new methods of analysis.

I think the next year will be very telling about the degree to which this convergence will happen, as cloud providers and big data vendors alike seek to capitalize on each other’s momentum. We already have Elastic MapReduce and data-as-a-service firms such as InfoChimps, but I suspect they’re just the beginning of what will become a broad base of services applying on-demand cloud resources to analytic workflows, and targeting CIOs with both of those capabilities on the tops of their minds.

Image courtesy of flickr user themagesticfool

5 Responses to “Big Data Is on a Collision Course with the Cloud”

  1. Deirdre Mahon

    As organizations grapple with how to better manage and leverage rich data that is growing very fast, IT is looking more closely at how to leverage the Cloud as a more economic platform alternative to investing in on-premise infrastructure. However, the term Big Data is somewhat confusing as it means different things to different organizations – to some it’s all about collating and making sense of “social media” data and to others its about sourcing OLTP data into a platform for deep analysis. Regardless of the reporting and analytics side of the equation – IT really needs to examine how best to store this data long term (often years) and how to cost effectively scale to meet future growth. There are technologies today that can compress & reduce data to 97%+ less storage footprint and when you are sending data to the cloud, you reall do need to compress – otherwise Big Data won’t get over the wire. Additionally, keeping big data in the cloud, you need to query it and quickly get answers and for massively compressed data, you don’t want to pay the penalty of re-inflation for your query.

  2. I like JP’s comments. One other way I see this is that traditionally, data had to be transported to the compute resources. In the Big Data era that’s hard to do. But in the cloud era (and particularly, with Private Clouds) compute can now be transported to where the data is. After all, VMs are just files. There are also some nice use-cases I’m seeing in the market where Big Data is becoming a PaaS service within special-purpose (community) clouds.

  3. Derrick,

    Your statement is difficult to refute, however, the way in which it is posed raises some interesting areas for discussion. Based on the way this piece reads, I interpret that you do not see Big Data as an inherent component of Cloud Computing. Whereas, I would argue that Big Data is merely a use case for Cloud Computing. Indeed, you state that you believe Cloud Computing is about operational efficiencies in IT and specifically around costs management. Yes, Cloud is that. But Cloud is also about scalability and elasticity, the type that allows Big Data applications to thrive. To separate these points into paths that may converge instead of a single path with many value propositions limits the value proposition for cloud and, in my opinion, adds further confusion to the misunderstanding for what cloud computing has to offer.

  4. Interesting article- I concur that many of the Big Data applications out there may very well converge onto public or private cloud infrastructures. However, in the Hadoop case, for instance, there may need to be some fundamental shifts in how locality is defined (trying to keep data movement within the rack) and on how task tracker and name-node are determined to be ‘alive’. This is usually run on a separate NIC/link in a Hadoop cluster and that capability may not be available depending on the public cloud being used.

    Net-net: There will probably be some purpose-built public clouds spec’d for Big Data, CFD Simulations, Modeling, etc that use higher performance CPUs, guaranteed network I/O and reliability, and probably increased density of DAS since most of the public clouds I have seen use some form of NAS or iSCSI and most Hadoop clusters use DAS.


    • Derrick Harris

      There’s already at least one purpose-built service with Elastic MapReduce, and I know that Cloudera’s distro is able to at least run atop both AWS and Rackspace. I think the as-a-service model could ultimately prevail, though, which would entail providers customizing the architecture for big data as you suggest.