10 innovators changing the game for Internet infrastructure

The Data Scientist: Andreas Sundquist, co-founder CEO DNAnexus

by Derrick Harris

There’s big data, and then there’s big data. DNAnexus co-founder and CEO Andreas Sundquist is concerned with the latter, which is why he chose to build DNAnexus — a platform for storing, analyzing and collaborating on the massive amounts of genomic data — in the cloud. But although cloud computing, because of its seemingly limitless capacity for storage and computing power, is now considered the assumed infrastructure for taking genomic analysis into the mainstream, it wasn’t always this way.

An MIT-educated computer scientist, Sundquist first got interested in genomics while stumbling into a computational biology course during his Ph.D. work at Stanford. “I ended up falling in love with the field because I realized how many interesting opportunities there were for someone with a computational background,” he told me recently. “More and more, science is a data-driven field [and] biology is a data-driven endeavor.”

That’s why, swimming against the tide of conventional wisdom, Sundquist and his co-founders Arend Sidow and Serafim Batzoglou (both of whom are Stanford professors) opted for the cloud when they launched DNAnexus in 2009. “Back then, then concept of using cloud to power genomics … was a foreign concept,” Sundquist explained. “We when we first started the company, I think many people in our field had no idea what the cloud was.”

But exponential drops in the cost of genome sequencing are leading to some amazing amounts of data that require the cloud’s scale and collaborative features. In about two or three years, Sundquist predicts, genome data will scale into the exabytes as millions of people send their DNA in for sequencing. DNAnexus itself already has processed petabytes worth of genome data and hosts the 400-plus-terabyte Short/Read Sequence Archive.

It’s not hard to see why Sundquist thinks the exabyte era is coming. Already, for example, services such as Ancestry.com and 23andMe are letting citizens get limited-purpose DNA profiles for less than $300. Ancestry.com costs only $99. Privacy concerns notwithstanding, Sundquist said, if the research world had access to a billion genomes, “We would literally be able to put every person into a family tree of humanity.”

Or consider the possibility of democratizing genomic data to the point where it converges with the quantified self movement currently sweeping through fitness-minded America. “I see [quantified self] as a movement of people who want to take control of their data and do things for themselves,” Sundquist said, which is exactly what DNAnexus is trying to for biologists who want to work on genomes. “I don’t know at what point everyone will have access to their own genomic data, but we certainly want to enable such a world to exist.”

And these uses pale in comparison to the possibilities of using genomic data to cure cancer or otherwise revolutionize the ways in which doctors diagnose and treat diseases. The opportunities surrounding health-related data have only just begun.

However, while the cloud might have solved the computational and storage issues around exabyte-scale datasets, building a system that can manage, store and work with exabytes of data is still a complicated task. Sundquist thinks his team will be able to solve these problems in part because it’s comprised of computer-science whizzes who normally might end up in more lucrative or popular fields than biotech. It might not ever be as sexy as playing FarmVille on your cell phone, but impacting human health is very important to the people who work for us, says Sundquist.