There’s confidence, and then there’s *confidence*. Nutonian, a Cambridge, Mass.-based machine learning startup, possesses the latter. The company claims its Eureqa software is the best tool around for discovering how and why pieces of data are related to one another, and it has raised a $4 million series A round led by Atlas Venture.

Eureqa can “essentially calculate laws of physics through data,” Nutionian Founder and CEO Michael Schmidt said during a recent call. “You might have noticed that there’s almost no laws of physics in business,” he added. “We’re gonna change all that.”

He doesn’t mean the software is using or actually discovering new laws of physics, it’s just a hyperbolic way of describing how confident he is in the statistical relationships Eureqa can provide. Once it has ingested a user’s data, the software forms hypotheses about it and starts building and running models to test them. It can do this “billions and billion of times over” until it finds the best possible mathematical formula for explaining the data, Schmidt explained.

By using this method, Schmidt said Eureqa can better predict the variable that will cause or affect others because it understands how they’re connected at such a deep level. Users can see the output of all this machine learning and see these relationships for themselves. This is a different approach to some other machine learning products that focus on predicting an outcome but not telling users why it’s predicting that outcome.

Technically, he the process is called “symbolic regression,” which Nutionian describes in more-technical terms on its website:

Symbolic Regression works by searching the space of mathematical expressions to find the model that best fits the data, both in terms of predictive accuracy and complexity. Unlike traditional linear and non-linear regression methods that fit parameters to an equation of a specific form, Symbolic Regression searches both the parameters and the form of equations simultaneously. Initial expressions are formed by randomly combining mathematical building blocks such as algebraic operators {+, -, /, x}, analytical functions (for example, sine and cosine), constants, and state variables. New equations are formed by recombining previous equations and probabilistically varying their sub-expressions. The algorithm retains equations that model the data better than others and abandons unpromising solutions. After equations reach a desired level of accuracy, the algorithm terminates, returning a set of equations that are most likely to correspond to the intrinsic relationships underlying the observed system.

If this all sounds like something out a university, that’s because it is. Eureqa is the product of research Schmidt had spent years on while a student and then a faculty member at Cornell. It was intended as a tool to help scientists do use data more effectively in their research.

After years of work he made a breakthrough (which was published in Science) and was finally able to make the software available for public download but, Schmidt noted, it was never intended for mainstream users. He got them nonetheless — the software was downloaded 10,000 times in the first year — and at some point came to realize there might be commercial value in what he had created. Eureqa has how been downloaded more than 40,000 times across 80 different countries.

“I was thrilled to have so much interest in my graduate school research, Schmidt said. But, he acknowledged, “I was sort of oblivious to larger market movements around big data.”

Atlas Venture’s Chris Lynch, Nutionian’s lead investor, has nothing but praise for the company — Schmidt in particular. “I offered Michael a term sheet the day i met him,” Lynch wrote in an email to me. “He is a Benjamin Franklin-class innovator; bold, of strong will, and singular in his vision.”

As Nutonian tries to scale and market itself into a business software company, Schmidt is also smart enough to know his software isn’t perfect. “I don’t think there’s ever going to be a perfect answer where you can clearly say this is a causation,” he said. Although, he added, Eureqa will get users closer to that point than just about anything else.

*Feature image courtesy of Shutterstock user Humannet.*

“He doesn’t mean the software is using or actually discovering new laws of physics, it’s just a hyperbolic way of describing how confident he is…”

So you admit your headline is hyperbolic and misleading. Why use it, then? Oh yeah, pageviews.

The Eureqa software is deserving of the enthusiasm of Mr. Harris. The 2009 publication in Science is precisely about discovering laws of physics in data, specifically a Hamiltonian for a double-pendulum system. What Eureqa got was the motion data from a double-pendulum, and the Hamiltonian describing the system was in the output. There are a total of eight “laws of physics” listed on p.83 of that article found only by analysis of the physical data. I have been using Eureqa intensely for several months in my job (my opinions here are, of course, my own), and what I am routinely seeing emerge are “rules of” our field. I’m not in physics, so Eureqa isn’t finding “laws of physics” in our data, but I can say that Eureqa consistently finds the relationships we know about, and is providing valuable insight concerning relationships we had not expected.

I have no first hand experience with the software, but it seems to me that overfitting would be a constant problem with that approach