Gene research in the cloud could help cure diseases in the lab

Understanding the nature of human stem cells and being able to identify and compare their characteristics is crucial for medical research.

That’s why Morgridge Institute, a non-profit biomedical researcher based in Madison, WI, used Cycle Computing’s software atop Amazon Web Services(s amzn) infrastructure to process and index human stem cells to build an extensive knowledge base. Morgridge won Cycle Computing’s Big Science Challenge earlier this year which gave it access to Cycle’s technology and Amazon’s cloud, and it has now completed its run, which topped out at 8,000 cores and represented about a million hours, or 115 compute years of work, according to Jason Stowe, CEO of Cycle Computing. All for $0.0175 per compute hour.

Building an encyclopedia of cells

Morgridge’s goal was to create a knowledge base, or an index of associations between genes and the types of cells those genes could turn into, Stowe explained in an interview. “They can turn into pluripotent stem cells, which are very similar to embryonic cells in that they can potentially differentiate into any sort of cell.” The problem with using adult stem cells in the past was that a liver cell stayed a liver cell — pluripotent cells allow researchers to design the cells they need.

Morgridge Institute’s Victor Ruotti.

Victor Ruotti, molecular biologist at Morgridge, was thrilled to get the resources. “We wanted to take cells and compare them to other cells. We took as many samples as we had and gathered as much information as we could. We compared every cell type to every other tissue we had and built a database to say which cell types there are.”

That accumulated knowledge can help researchers figure out how to build the types of cell structures they need for experimentation and research.

The sample size was not that large, 124 samples, but each sample had more than 20 million data points to be compared. “When you multiply that out, it’s a very complex and resource intensive problem,” Ruotti said. Netted out, that’s a total of 1,003,303 core-hours against 11,955 pairs of samples processed, Cycle said.

The work ran on a mix of Amazon EC2 spot instances, including some high-memory instances on Centos.

Goal: curing disease in a petri dish

“We’re in an exciting phase in stem cell research [with pluripotent cells]. If a doctor needs cardiovascular cells to work on vessel or artery constructions, he can get them,” Ruotti said. “Medicine will change because doctors will be able to treat diseases in a petri dish. We can take a disease and simulate it in the lab, treat it in the lab, hopefully cure it there, and then implement the same cure in patients.”

Cycle Computing is determined to show that important workloads can run efficiently on low-cost public cloud infrastructure. Last year, it helped  Schrödinger Pharmaceutical spin up a 50,000-node Amazon cluster for its computational drug design work.

“We did a large run for a Big 5 pharma last year and the most common comment we got from the articles was, ‘I wonder if I could play “Call of Duty 4″ on this massive supercomputer.'” Stowe said. “What worried me about all the glitter was people would miss what is truly gold, which is that scientists can access world-class compute infrastructure at very reasonable cost and in a short time frame.”

Feature photo courtesy of  Flickr user CodonAUG