Think about Facebook’s (s fb) new Graph Search feature — only infinitely more complex — and you have a rough understanding of what Palo Alto, Calif.-based startup Syapse is trying to do. The company, which on Tuesday announced $3 million in Series A led by The Social+Capital Partnership (and previously raised a $1.6 million seed round), is building a data-management platform designed to let researchers and physicians easily pore through mountains of complicated molecular data in order to better diagnose a whole range of potential illnesses.
But to understand how Syapse works, you have to understand the problem it’s trying to solve. A condensed version of the situation is this: Sequencing genomes, proteomes, biomes and other microscopic, but very important, biological players generates a lot of data. However, we’re not just talking about the terabytes of data that a fully sequenced genome (or perhaps the tens of thousands sequenced gut bacteria, which can change composition hourly) will produce, but also patient data (e.g., name, date of birth, smoker or non-smoker, etc.) and process data (i.e., everything that happens from the time a lab gets a sample to the time a doctor gets a report on his desk).
The complexity and perpetually changing nature of both the field of “omics” as it’s called, and the data itself, further complicates things. According to Syapse Co-founder and President Jonathan Hirsch, diagnostics labs and workers are always using new and different processes trying to optimally extract, tag and analyze samples. Furthermore, expert knowledge of what any particular genetic or other signature means is always changing (for example, Hirsch said, we only really understand about 1 percent of the human genome), as are the ontologies that lab workers, researchers and physician specialists use as their particular fields evolve.
“There is basically a wholes set of measurements that go beyond just sequencing the genome,” he explained. Analyzing genomes, proteomes and anything else is “like a very, very complicated recipe” that involves much more than swabbing someone’s cheek and getting back a comprehensive, understandable report. Syapse doesn’t actually do any of the sequencing work (like a DNAnexus or Bina Technologies does,) but just captures the metadata from those lab processes and connects to those hefty sequenced data via an API so the platform has access to everything it needs.
Organizing complex data requires a graph
Using semantic-analysis and graph-processing techniques, Syapse thinks it can bring the world of “omics” under control. Although it’s currently working with research centers that analyze the data in order to better hone their processes, Hirsch expects the company will eventually make most of its money from doctors and hospitals using Syapse to help better diagnose their patients. “[We’re] trying to fill the gap and be the company that cracks the physician side of this,” he said.
This is where the Graph Search comparison comes into play. The Syapse platform is continuously updated with the latest ontologies from various fields and the changing meanings of the metadata associated with the various lab processes. All this information is stored based on its relationship to other pieces, and semantic analysis means the Syapse software knows that Term X in one field might actually mean Term Y in another.
Syapse has essentially created a “huge knowledge graph” of clinical, diagnosis and omics data, Hirsch explained, and doctors and researchers can mine it using whatever terms they use in their daily lives. They can easily search, for example, by patients they’ve treated for breast cancer whose genes showed certain specific markers and were processed using particular techniques in the lab in order to find connections among them.
Syapse Co-founder and CEO Glenn Winokur — an admitted “IT guy” compared with his biotech-focused partners — likes to put the platform’s promise in the terms of business software. “Think of this entire workflow as similar to a sales or marketing workflow,” he said, adding that Syapse is trying to make mining omics data as simple for its users as Salesforce.com (s crm) makes CRM for its users.
That’s probably a good analogy for selling the software to hospital administrators who might be more concerned with budgets than with big data technology. As we’ll discuss in more detail at our Structure: Data conference on March 20-21, business people are increasingly concerned with using data to make better decisions, but they need applications that make it easier and faster to find stuff out than is possible with many open source packages targeting engineers and statisticians. If Syapse can deliver on this promise for making sense of our complex biological systems, it could make a big difference.