It won't steal your jobs, yet

Google is funding “an artificial intelligence for data science”

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Google is funding a project called Automatic Statistician that bills itself as “an artificial intelligence for data science,” it announced Tuesday. The project, which comes out of the University of Cambridge and is still in its early stages, aims to automate the selection, building and explanation of machine learning models.

In a nutshell, Automatic Statistician works by looking at a dataset and then determining which type of model would be best for analyzing it as well as which features, or variables, are the strongest. After the model runs, Automatic Statistician will return a text report explaining its findings in plain English — or as close as you can get when dealing with statistics.

A snippet of an Automatic Statistician report on unemployment data.
A snippet of an Automatic Statistician report on unemployment data.

The project’s homepage quotes Google research scientist Kevin Murphy, who also wrote the blog post announcing Google’s funding for it, explaining the promise of Automatic Statistician like this:

[blockquote person=”” attribution=””]The first problem is that current Machine Learning (ML) methods still require considerable human expertise in devising appropriate features and models. The second problem is that the output of current methods, while accurate, is often hard to understand, which makes it hard to trust. The “automatic statistician” project from Cambridge aims to address both problems, by using Bayesian model selection strategies to automatically choose good models / features, and to interpret the resulting fit in easy-to-understand ways, in terms of human readable, automatically generated reports.[/blockquote]

However, Automatic Statistician isn’t the first attempt to deliver this type of service; there have, in fact, been multiple commercial attempts at doing similar things. The most accurate comparison might be to a now-defunct tool by machine learning startup Skytree called Skytree Adviser, which also automatically selected models and generated text reports of its findings. Startups including BeyondCore, Nutonian and even Ayasdi are all promising varying degrees of this functionality, as well.

As sexy as it is to talk about automating the data scientist job, though, it’s a bit early to suggest any software will eliminate the need for such employees any time soon. Even if projects like Automatic Statistician or commercial tools can make it possible for relative laypersons to run machine learning models and uncover patterns, that’s just a step or two down what’s often a much-longer path of turning insights into real value or, possibly, products.

7 Responses to “Google is funding “an artificial intelligence for data science””

  1. Leland Wilkinson

    I am the author of Skytree Adviser. I wrote the program prior to joining Skytree. It consists of about 75,000 lines of Java and 25,000 lines of XML. Adviser does not optimize a goodness-of-fit criterion or use Bayesian methods to do model selection. It is not a machine learning program. Instead, it uses the same heuristics statisticians use (residual diagnostics, etc.) and will occasionally fit models that have higher prediction error than a blind optimizer because the assumptions are met better by the selected model. For example, asking Adviser to predict a variable from a set of predictors may result in OLS, nonlinear regression, (multinomial) logistic regression, Poisson regression, zero-inflated Poisson, negative binomial regression, etc., depending on inferences Adviser makes concerning the data. Adviser uses robust statistical methods (biweights, least-median-of-squares, etc.) to test assumptions and will generate cautionary text when it thinks the assumptions are not met.

    I joined Skytree last year and we planned to expand Adviser to the machine learning space using Skytree technology. I transferred Adviser to Skytree in exchange for the opportunity to launch it through their website. Adviser generated several important new bookings and considerable excitement.

    Because Adviser was not a machine learning program and was not designed to handle Big Data, Skytree decided to remove Adviser from its website and discontinued its availability. That is why this article uses the word “defunct” to describe Adviser. It is not defunct, it is not beta; it is a production desktop application.

    Leland Wilkinson

      • Not so much distraction, as measured risk-taking in investing in new technologies. It would be rather silly to expect world-shattering effects from every company or product that receives large investments from noteworthy investors. Why is this any different? What makes Google so special?