14 Comments

Summary:

A startup called Emerald Logic claims it uses an evolutionary process to discover the best algorithm for predicting outcomes from any dataset. It might sound to good to be true, but the company claims successes already and is one of several startups trying something similar.

Think about being a hospital that wants to improve survival rates for patients. You have lots of data about patients — their medical histories, EKG readings, room numbers, doctors, billing information and much more — and you certainly know whether they leave alive or dead. Somewhere in all that data, the current thinking goes, there must be a formula that can predict what’s going to happen.

It’s not so much a big data problem as much as it’s a complex data problem. According to Patrick Lilley, co-founder and CEO of an Aliso Viejo, Calif., startup called Emerald Logic, the real world runs on systems where there are inputs and outcomes, only the complexity of the data we’re generating makes it very difficult to find the inputs that will lead to the best outcomes. He equates it to sticking a marble in a black box, eventually getting it out the other side, and then having to diagram what you think the inside looks like.

“The challenge there is you have to model what’s going in that system and you can’t often look inside,” he said.

Lilley also claims his company can help you find the answer. The company’s software, called FACET (short for Fast Collective Evolution Technology), tests tens of thousands of algorithms against a dataset in order to find ones that represent the relationships between those data and the end result. He calls the process “evolutionary computing,” because they evolve, mate and migrate, and only the best one survives.

“This is a monkeys-on-typewriters sort of problem,” he acknowledged, referencing those theories about how long it would take a group of primates to reproduce the complete works of Shakespeare. The software doesn’t know anything about the field it’s working on or have any presuppositions about what’s in it. It’s simply trying to predict one thing from another, and he says it’s pretty effective.

FACET works by taking a sample of a dataset, generating tens of thousands of algorithms from it, and then testing them in order to determine the most-predictive one. “Because it’s evolution, it tends to wash away the variables and the math operators that are unimportant,” Lilley explained. “… No more than eight things have ever mattered in any model we’ve ever generated.”

Once the process is complete, the algorithm is tested against new data in order to ensure its predictions are still accurate. Emerald Logic delivers FACET as a cloud service, so customers really only pay for the algorithm it produces. Customers own all the intellectual property associated with it, and Emerald Logic charges based on the economic value of the problem it’s trying to solve.

A hot field for startups, actually

All of this probably sounds a little too good to be true — and maybe it is — but Emerald Logic is really just putting a different spin on something that multiple startups are also pushing. There is BeyondCore, with its service for finding the variables most statistically relevant to a given outcome, and Emcien. There’s Ayasdi, which runs thousands of machine learning algorithms to discover and then visualize connections among massive datasets.

Emerald Logic’s promise actually sounds similar to that of Nutonian, a startup from former Cornell Creative Machine Labs researcher Michael Schmidt that claims its Eureqa software can “calculate laws of physics” present in business data.

Each approach runs into the question of whether anyone can trust some software to uncover what’s important in their data, but that’s not exactly the case. Once data scientists or business analysts see what the software has come up with, they can dive in and look at the variables, examine the connections, and figure out if they buy into it. They can run tests to determine if maybe there’s something there worth investigating further.

“Artificial imagination”

Besides, Lilley argued, he has proof that FACET works. The company did work with King’s College in London around identifying markers for Alzheimer’s disease and highlighted 14 out of a list of 11,000 possibilities. Half of them had already been mentioned on prior literature, a quarter had been thought of as possible markers and the remaining quarter were novel to FACET. It would have been easy to ignore what what the software found had it not validated those previous findings and inklings, Lilley said.

According to a February 2013 press release announcing that partnership, “Using these markers, plus APOE genetic information and demographics, the collaborators produced a mathematical classifier of 94% accuracy in distinguishing Alzheimer’s study subjects from controls or with those with mild cognitive impairment.”

In finance, FACET routinely finds that how a company incorporates is a strong predictor of whether it will succeed. In consumer loans, it has found that “effectively, liars tell longer stories,” Lilley said.

And in fact, he noted, Emerald Logic is his third startup with the same co-founder and FACET is kind of just an iteration on the technologies of the previous two. The first, called Digital Transit, used a genetic algorithm to do over-the-air software updates for mobile phones. That company merged with Bitfone in 2001, which HP acquired in 2006. In January 2014, Qualcomm bought the associated patents from HP.

The second startup, called Deep Six Technologies, generated decision trees based on data about email servers in order to do spam detection. The two founders have been working on Emerald Logic since 2011.

Whether or not FACET — or anything of its ilk — turns out to be a magic bullet, they’re all working under the same assumption that has driven the push of big data technologies and data science into the mainstream. Namely, that if data really does contain answers to tricky problems, there’s no way a person can figure out all the right questions to ask to find those answers among thousands of different variables. At some point, some parts of the process must be automated in order to steer people in the right direction.

This is why Lilley refers to Emerald Logic and FACET as “artificial imagination” rather than “artificial intelligence. “The more expertise someone has in a field,” he explained, “the more they know better and the less they sort of look around.

“… This method is pretty sideways. It’s not the way people are used to thinking about the problem.”

Feature image courtesy of Shutterstock user phipatbig.

  1. I wonder what they’re guidelines for gaining business are, ethically speaking. They’re basically predicting the future, which is great for all industries, but not necessarily for mankind. I much rather see net growth in the movie industry than in the weapons industry.

    Share
    1. Patrick Lilley Wednesday, April 9, 2014

      Hello Gaby,

      Nice that you think about this. Our principal mission is to solve billion life, billion dollar problems — in other words, mankind’s biggest challenges. We aren’t working in the weapons industry.

      Best,
      Patrick Lilley

      Share
  2. Wonder how , this could be different from an optimization problem (logistic regression or neural network ? )

    Share
    1. Patrick Lilley Wednesday, April 9, 2014

      Good question. It’s appropriate to think of optimization, but we approach the problem very differently than neural networks or any kind of regression. In our view, those methods have constraints and simplifying assumptions that try to force-fit the real world to their math and their limitations. To paraphrase Derrick Harris, we’re doing the opposite: finding the mathematical relationships that describes the real world’s underlying truth.

      Share
  3. Thomas Johansson Wednesday, April 9, 2014

    It is good to find coverage of “evolutionary computing”, but I must step in and provide some feedback. Without digging deeper into this highly exciting field, it can sound almost like “magic”. It is about using the most powerful class of algorithms known, evolution. It has random components (yes, it is a semi-stochastic population-based process) to create beneficial variation (mutations are the ultimate source of variation and they are often random), but given that, statements like “monkey with a typewriter” is directly wrong. Any researcher in this field will find it a highly misleading statement. Monkey could for example never ever create something as complex as a human, but evolution as a process can.
    I will not judge their technology, but one must understand that there are thousands of “evolutionary computation” system at work today in diverse industries. One of the most prominent fields is of course medicine. Just do a simple search for genetic programming and medicine or perhaps genetic algorithms and medicine. It is nothing new at all. There are multiple research institutions and companies that have been working actively in this field for 10-15 years, and it sounds like these guys invented it. That is not helpful for the readers. Testing many millions of computer evolved algorithms is completely standard, and everyone does it using parallel computing approach. The most common way is through speciation and migration using a distributed processing architecture. This has been done for almost 20 years.

    Share
    1. Patrick Lilley Thursday, April 10, 2014

      Hello Thomas, you clearly understand the field, and I enjoyed what you said about “the most powerful class of algorithms known, evolution”. I agree with many of your statements, especially that we did not invent evolutionary computing. We ourselves as co-founders have been using evolutionary methods for almost 15 years in real-world implementations of increasing sophistication and flexibility. That said, at Emerald Logic we’re not using genetic algorithms nor the genetic programming methods that have been around for a while. You’d also be right about monkeys on typewriters if we had meant it to describe how our technology works. As you indicated, the difference between random “monkeys on typewriter” testing and evolution is that evolutionary methods aren’t simply trying solutions at random. However, my monkeys on typewriters analogy was not about randomness; it was about using a solving mechanism that requires no domain knowledge, which Mr. Harris captured well in his following sentences. What we find is that domain expertise often gets in the way; the more you know, the less you look around.

      Share
    2. Derrick Harris Thursday, April 10, 2014

      Thanks for the very fair comment. I should have noted the existence of the field prior to Emerald Logic.

      Technological differences aside, it does appear the company is early in its attempts to commercialize the approach, especially for general-purpose predictive modeling. Thus my comparison to other new approaches in data analysis.

      Share
  4. From a purely business perspective, this is an amazing technology!

    Share
  5. >He calls the process “evolutionary computing,” because they evolve, mate and migrate, and only the best one survives.

    Emerald Logic sounds interesting but what’s the difference between “evolutionary computing” and “genetic algorithms” and secondly “artificial imagination” instead of “artificial intelligence” is bad hype marketing.

    Cheers….@sardire

    Share
    1. Patrick Lilley Thursday, April 10, 2014

      Steve, there are definitely many differences between our form of evolutionary computing and genetic algorithms. Most of these are too proprietary to discuss, but one significant characteristic is that our approach requires no assumptions about the nature or structure of the problem. Further, our software requires no domain expertise to solve a problem — though domain expertise is certainly useful in applying the resulting insights. We’ve solved problems where we didn’t know what the inputs and outcomes were (i.e. the labels were obfuscated, we didn’t know what form the solution should take, and we didn’t even know the nature of the problem other than it was to predict one column in a data set from some subset of the other columns.

      As for artificial imagination versus artificial intelligence: Think about the difference between imagination and intelligence, and suppose that the former truly characterized our software. In that case you should hold the software to the high standard of one of Webster’s definitions of imagination: “The ability to think of new things”. See our Alzheimer’s gene expression work with King’s College London (featured in Drug Discovery News). You’ll see Richard Dobson, bioinformaticist at King’s, point out that our FACET software discovered novel biomarkers not previously implicated in Alzheimer’s. FACET did this without any human guidance, domain knowledge, or hypotheses.

      We have many other examples of FACET producing novel discoveries on its own. It has invented its own ways to correct inconsistently represented data so that it could then use the cleansed independent variable. In one of our product release test problems, it consistently synthesizes pi in order to solve for the circumference of circles. When predicting myocardial infarctions, we’ve even seen it exploit latent variables that were not in the data set but that drove patterns in variables that were in the data set. Sounds like magic, sounds like hype. We’re happy to prove it.

      Share
      1. Thank you Patrick for the elucidation and reread Derrick’s post again and it makes better sense on the second pass through ;-)

        Share
  6. This is very exciting. It looks like there’s a trend in Artificial Intelligence, where people are building “meta AI algorithms” whose job is to find the right AI algorithm with the right parameters for a given problem. This is the case with Google Prediction API for instance.

    These meta algorithms are automating the jobs of Data Scientists, but if I understand correctly, FACET is actually doing a better job?

    Share
    1. Patrick Lilley Saturday, April 12, 2014

      Hello Louis, that depends what you mean by “better” of course. Certainly FACET has, on every problem we’ve tried, beat all other published results or previous results at clients/prospects. At the same time, we have typically freed up the best human minds from figuring out which variables are most useful, from building models, etc. This means that they can focus on translating insights into meaning for organizational learning (i.e. explaining what we learned together, so that the organization makes better decisions even outside using the model itself). They can also spend more time communicating to their organization on which changes to inputs will produce desired changes in outcomes. Your background is quite interesting, Louis. I would welcome your contacting us to discuss further.

      Share
      1. Indeed. Any time spent worrying about algorithms is time that’s not spent working on the data (collection / extraction / enrichment / cleansing) and acting on the model’s output / predictions. Switching to email ;)

        Share

Comments have been disabled for this post