To Space and Beyond: The Rise of Research-driven Cloud Computing


I remember attending the inaugural GridWorld conference in 2006 and hearing Argonne National Laboratory’s Ian Foster discuss the possible implications of the newly announced Amazon EC2 on the world of grid computing that he helped create. Well, 2010 is upon us, and some of the implications Foster pondered at GridWorld have become clear, among them: For many workloads, the cloud appears to be replacing the grid. This point is driven home in a new GigaOM Pro article (sub req’d) by Paul Miller, in which he looks at how space agencies are using the cloud to do work that likely would have had the word “grid” written all over it just a few short years ago.

Miller cites a particularly illustrative case with the European Space Agency, which is utilizing Amazon EC2 for the data-processing needs of its Gaia mission, set to launch in 2012. The 40GB per night that Gaia will generate would have cost $1.5 million using local resources (read “a grid” or “a cluster”), but research suggests it could cost in the $500,000 range using EC2. The demand for cost savings and flexibility isn’t limited to astronomy research, either.

Research organizations that need sheer computing power on demand are looking at EC2 as the means for attaining it. Several prominent examples come from the pharmaceutical industry, where companies like Amylin and Eli Lilly have publicly embraced the cloud, as has research-driven Wellcome Trust Sanger Institute. A related case study comes from CERN’s Large Hadron Collider project, which is using EC2’s capabilities as a framework for upgrading its worldwide grid infrastructure. So high is demand cloud for resources, in fact, that even high-performance computing software vendors, such as Univa UD (which Foster co-founded), are building tools to let research-focused customers run jobs on EC2.

Unlike HPC-focused grid software, however, the cloud opens up doors beyond crunching numbers. Miller also highlights NASA’s Nebula cloud, a container-based internal cloud infrastructure used to host NASA’s many disparate web sites. Built using Eucalyptus software, NASA users can provision the resources they need for their sites as those needs arise. In theory, they could call up some of those resources for parallel processing, too. While grid computing projects often federate resources and democratize access to them, they do so at a scale that makes tasks like site-hosting impractical, and grids don’t provide the nearly bare-metal access that makes cloud resources so flexible.

Of course, none of this is news to Foster. In early 2008 he noted the myriad similarities between the two computing models, including the ability to process lots of data in a hurry. In late 2009, the cloud market having matured considerably, he observed that a properly provisioned collection of Amazon EC2 images fared relatively well against a supercomputer when running certain benchmarks. There are plenty of reasons why cloud services will not displace high-end supercomputers, but where simple batch processing and cost concerns meet, the cloud could make in-house grids and clusters things of the past.

Full article on GigaOM Pro (sub req’d)


Bhavik Vyas

Sean – I think you may be missing the point. EC2 is a computing platform, not a storage service, as such the requirement of EC2 is to process and generate the data. AWS’s main storage platform is S3, to store 14TB of data would cost ~$2200/month (, which is compelling as it includes h/w, power, cooling and global replication/ availability options.

Ian Foster

Hi Derrick:

Nice article :)

I’d love to see the full article, and the Miller article. But signing up is a painful process …

Regards — Ian.


The 40GB per night that Gaia will generate would have cost $1.5 million using local resources (read “a grid” or “a cluster”), but research suggests it could cost in the $500,000 range using EC2.

Do you mind to elaborate?
$X per what? Year?

40GB per night is only 14TB per year.
You can buy a mid-range enterprise NAS system for less than $50K and over a 3 year period the whole thing wouldn’t cost over $30K/year.

I’m not saying that cloud storage isn’t good or competitive, just that these cost figures look fishy.

Derrick Harris

Sean, I didn’t write the post about Gaia, so I don’t have the details on the cost figures. However, ESA needs to process that 40GB per night, not just store it. I believe the cost numbers related to the computing resources, not storage resources.

Paul Miller

I did write the piece, so can provide some more information… :-)

Derrick’s correct that the costs are based upon processing and storage of the data. They’re also calculated over the five year duration of the mission.

ESA’s calculations suggest spending up to $1.5million on hardware and almost as much again on ancillary costs for power, cooling, admin, etc.

The nature of the data lends itself to bursts of computing activity, and the $500k figure on Amazon is based upon near-optimal conditions. If they had to recompute significant chunks of the data because of errors or miscalculations then the gap between on- and off-premise would narrow.

Comments are closed.