To Space and Beyond: The Rise of Research-driven Cloud Computing

I remember attending the inaugural GridWorld conference in 2006 and hearing Argonne National Laboratory’s Ian Foster discuss the possible implications of the newly announced Amazon EC2 on the world of grid computing that he helped create. Well, 2010 is upon us, and some of the implications Foster pondered at GridWorld have become clear, among them: For many workloads, the cloud appears to be replacing the grid. This point is driven home in a new GigaOM Pro article (sub req’d) by Paul Miller, in which he looks at how space agencies are using the cloud to do work that likely would have had the word “grid” written all over it just a few short years ago.

Miller cites a particularly illustrative case with the European Space Agency, which is utilizing Amazon EC2 for the data-processing needs of its Gaia mission, set to launch in 2012. The 40GB per night that Gaia will generate would have cost $1.5 million using local resources (read “a grid” or “a cluster”), but research suggests it could cost in the $500,000 range using EC2. The demand for cost savings and flexibility isn’t limited to astronomy research, either.

Research organizations that need sheer computing power on demand are looking at EC2 as the means for attaining it. Several prominent examples come from the pharmaceutical industry, where companies like Amylin and Eli Lilly have publicly embraced the cloud, as has research-driven Wellcome Trust Sanger Institute. A related case study comes from CERN’s Large Hadron Collider project, which is using EC2’s capabilities as a framework for upgrading its worldwide grid infrastructure. So high is demand cloud for resources, in fact, that even high-performance computing software vendors, such as Univa UD (which Foster co-founded), are building tools to let research-focused customers run jobs on EC2.

Unlike HPC-focused grid software, however, the cloud opens up doors beyond crunching numbers. Miller also highlights NASA’s Nebula cloud, a container-based internal cloud infrastructure used to host NASA’s many disparate web sites. Built using Eucalyptus software, NASA users can provision the resources they need for their sites as those needs arise. In theory, they could call up some of those resources for parallel processing, too. While grid computing projects often federate resources and democratize access to them, they do so at a scale that makes tasks like site-hosting impractical, and grids don’t provide the nearly bare-metal access that makes cloud resources so flexible.

Of course, none of this is news to Foster. In early 2008 he noted the myriad similarities between the two computing models, including the ability to process lots of data in a hurry. In late 2009, the cloud market having matured considerably, he observed that a properly provisioned collection of Amazon EC2 images fared relatively well against a supercomputer when running certain benchmarks. There are plenty of reasons why cloud services will not displace high-end supercomputers, but where simple batch processing and cost concerns meet, the cloud could make in-house grids and clusters things of the past.

Full article on GigaOM Pro (sub req’d)