Jason Stowe just helped a USC researcher analyze 205,000 organic compounds to determine which ones might be good at powering the next generation of solar energy technology. “There are so many possibilities here,” Stowe explained to us on this week’s Structure Show podcast. “The problem is finding the right material, the one that actually has the properties you need — in this case a very efficient transformation of photons into electricity — without spending the entire 21st century looking for it.”
Thanks to Stowe and Cycle Computing, the cloud computing startup of which he’s CEO, that task took about 18 hours. It would have taken about 264 years running on a single Intel Sandy Bridge processor, but Cycle Computing had a few more cores at its disposal — 16,788 of them in total — all running in the Amazon Web Services cloud. The system had a peak performance of 1.21 petaflops (or 1.21 quadrillion mathematical operations per second), comparable to some of the fastest supercomputers in the world.
“If you bought the equipment, it would be at least $68 million,” Stowe estimated. In the cloud, those 18 hours of computing cost about $33,000. “The math of this is really hard to deny,” Stowe added. “… There’s really no comparison in terms of the ability to do the science.”
But the work with USC is just the latest in a series of impressive projects for Cycle Computing, which has run even larger systems — some up to 50,000 virtual cores — for its research-oriented customers. In this interview, Stowe talks all about how Cycle Computing does what it does and how that will affect the future of innovation. Here are a collection of quotes that frame the scope of a very interesting, very geeky conversation — but if you’re into large-scale computing and the future of science, it’s really worth listening to the whole thing.
A noble vision
“Our fundamental belief is that the utility access to high-performance computing resources is going to be the single-largest accelerator of human invention over the next couple decades.”
Freeing scientists from their constraints
“There’s [a] the problem that your researchers and engineers generally tend to start sizing the questions they’re asking to [their] infrastructure. So, they’ll stop asking big questions, or the right questions … and they’ll instead run things that execute in a reasonable period of time on the instances they have available in front of them.
“… What our software does is it turns this raw infrastructure in the opposite way … Given this workload that I need to run, how do I build a cluster dynamically that is capable of answering it in a reasonable period of time?”
A new kind of computing for a new kind of science
“[I]f you look over the last 10 to 20 years, most of the newer forms of science — things based on Monte Carlo analysis, statistical analysis [and most genomics, life sciences, and big data work] — is all pleasantly parallel workloads. These are not workloads that require all the machines to be doing the exact same thing at the exact same time [like a traditional supercomputer]. They can work independently from each other.”
Don’t write off expensive supercomputers just yet
“If anything, I would actually argue that when there’s a lot of these university systems or government research systems that are stellar in interconnect — they’re so good at that that’s really all they should ever run. So, the fact that we’re running things like bioinformatics or needle-in-a-haystack problems or big data or what have you on SGIs or Crays is actually kind of ludicrous to me because those applications do not take advantage of the interconnect.”
On performance for the sake of performance
“I wrote software that managed No. 84 on the Top500 list in 2004, personally. I know that, I’ll always have that memorized because, let’s be real, it’s cool. But at the same point, the thing that matters is how much science are you getting done. Linpack is not a science benchmark.
“… It’s the very cool the scale we’re able to hit. But the reason I think it’s cool is not because ‘Look what we did. We’ve got this virtual infrastructure that doesn’t exist from our perspective — I’m sitting in front of a laptop but really behind it there’s 150,000 cores.’ That is kind of neat, but the really cool thing is when you get people to realize that they don’t have to fit their science into the box that they were able to afford last year.”
Feature image courtesy of Shutterstock user Oleksiy Mark.