Out of Cloud Chaos Comes Structure


In planning for last Wednesday’s Structure 08 conference, we at GigaOM had our heads in the cloud. We aimed to draw attention to the resurgence of hardware underlying the various software and web services that consumers and businesses now use, and hoped to define the emerging set of offerings that comprise cloud computing.

That definition is important. But not as important, I realized, as figuring out which business models will win out. Because while everyone wants to push their own definition of cloud computing, at its heart, cloud computing is about moving, storing and delivering data on demand.

After moderating two panels, watching almost all of the speakers and having numerous conversations, I came away with the belief that most people view cloud computing not as access to computing resources, but access to services ranging from application-specific offerings such as Salesforce.com to compute grids like that of GoGrid or EC2. And when it comes to buying into such data services (be they software, a platform, storage or compute time), there are certain questions that need to asked, among them:

How do I get my data into the cloud? Maybe it’s as simple as calling up Salesforce.com, or a bit more complicated, like using an API to tap into EC2, but to use a cloud you’re going to need bandwidth. Whether it’s figuring out how to measure and appropriately charge people for bandwidth as Google is attempting to do with their structured meta data, or contracting with a CDN to lower latency, the delivery of data in and around the cloud represents one of its biggest costs and is subsequently one of the areas that’s ripest for innovation.

What format does that data need to be in? Different clouds work with different software. Some clouds work with Windows and others are only friendly to the LAMP stack. Various people expressed the idea that the industry would divide along the lines of low-margin, general purpose clouds like Amazon Web Services, and high-margin, special-purpose clouds such as Heroku‘s Ruby on Rails testing environment (which is built on AWS). The key is to know what you need from a cloud before investing.

How can I change and move that data? The differing programming languages or operating systems accepted by various clouds are only part of the issue. The still undecided fight will be between proprietary formats such as BigTable and open standards that are truly standard, as in used by many, many developers. It’s a young effort, so there are no set standards yet, but until there are, transferring data kept in the cloud will never be as seamless as the bank analogy pushed by Sun CTO Greg Papadopoulos.

So while I spent most of my time trying to figure out which areas of compute infrastructure can be offered as a service while providing the highest margins or most defensible market share, I should have been keeping my eye on the data, because how providers treat the data will determine how their business models evolve.


Rahul Dave

For folks like me, working in Cloud oriented aspects of science, these issues have been on the radar for a while. In astronomy, we expect 10-100TB/night data rates very soon. While we can compress by using at telescope-cluster computation the data by an order of magnitude, this still leaves us with 10TB/night or so to be analyzed/searched/stored and on which astronomical pipelines must run. What does this mean?

It means that we must (a) deal with the financial aspects of poor scientists having to pay for bandwidth from remote telescope sites to data centers (b) deal with the programming aspects of situating computation at the data rather than the traditional paradigm of situating data at the computation (like the scientists desktop) (c) wholeheartedly use web services to make the composition of pipelines across multiple data centers (and thus computational centers) possible. (d) deal with huge data redundancy at low cost issues

It should be a fun time. Some progress has been made on grid models of computation…the ‘run your programs where you can get cycles’ model..but it is the data oriented cloud computing notion that is more important..computation is now cheap but must be localized to the data.

The web 2.0 paradigms, cloud computing, and experimental science are meeting in a big way!

Comments are closed.