Cycle Computing spins up 50K core Amazon cluster

 For those who doubt that public cloud infrastructure can handle the toughest high-performance computing (HPC) jobs, Cycle Computing and Schrödinger have some news for you. The two companies used a 50,000-core Amazon cluster to run a complex screening process to locate compounds that could pay off in new cancer drugs.

The problem for computational chemists and biologists is there’s a trade-off between accuracy and speed. This  “Naga” compute environment built by Cycle atop the Amazon cloud eased that tradeoff, said Ramy Farid, president of New York-based Schrödinger, which specializes in computational drug design.

“We’ve got these really accurate methods but they would take months on a normal cluster. The problem is we want to do the best possible science fast,” Farid said in an interview this week.

That’s where Cycle Computing comes in. The company has made its name building high-performance computing atop AWS infrastructure and has previously deployed 10,000 and 30,000 core clusters on the cloud. For Schrödinger, it upped the ante to 50,000 cores. The alternative in this case would be for Schrödinger to build its own 50,000-core cluster or log time on a supercomputer, said Cycle Computing CEO Jason Stowe.

HPC for rent

“Practically speaking, the latter is impractical for a for-profit company and is generally restrictive. If you’re an academic wanting time on the San Diego Super Computer, for example, you’ll have months of wait time to get approved and even then you get a limited-time window — so if something with your software is not working at that time, you’re out of luck.” And, big supercomputers don’t tend to run the kinds of software these companies want to run. “The beauty of the cloud is it runs your flavor of Linux and other software,” Farid said.

On the other hand, building a 50,000 core cluster could easily cost $20 million to $30 million, he said. The Schrödinger project, by contrast, cost about $4,850 per hour to run.

For this trial, the cluster had access to all regions of AWS and used all of them in some capacity. The application used the various EC2 APIs to provision the resources.

“All the compound data for analysis was uploaded into S3 [Amazon’s Simple Storage System]. The cluster was provisioned along side it and grabbed data from S3 to run the calculations and then pushed it back into S3,” said Crowe, who will be talking about the implementation Thursday at an Amazon event in New York. The application also took advantage of some Amazon IP address and DNS capabilities.

Finding a needle in a very big haystack

Farid could not talk much about the specific research purpose other than to say the goal was to find compounds that could be developed into drugs that fight a type of cancer.  The use of the huge cluster enabled Schrödinger to use the more accurate version of its Glide software — in the past it would have had to use the less accurate screen and perhaps miss some compounds that could be extremely useful.

“The problem they’re solving is amazing. It’s like the target is a lock and the confirmation is a key — what Ramy’s software does is let you simulate 21 million keys potentially matching that lock. There are typically a number of false negatives and the reason is you can’t sample all the orientations [of key to lock] properly so it will not show a match that could actually be a match. You could miss amazing drugs that could have an impact,” Crowe said.

The use of this huge cluster enabled Schrödinger to run its much more accurate, but much more compute-intensive version of its screening software and find compound candidates the other software may have missed.

The result of the three-hour run? “We identified  a number of compounds that we will purchase and test,” Farid said.