Weekly Update

Machine Learning, in Redmond and beyond

After the US independence day holiday this past Friday, it’s been a slow news week. So I wanted to focus on a specific topic in this week’s update: predictive analytics/machine learning, which is presenting itself as the next “it” thing in the Big Data world.

A bunch of companies are in this market, with new ones just coming online. Perhaps more important, existing BI companies, like Pentaho, are getting into the game. And remember, machine learning has been around for a long time, since the days when it was called Data Mining. Veteran players, like MicroStrategy and Microsoft, have been on the scene for a while. And companies like SAS have long specialized in this arena.

Microsoft’s position in Machine Learning is an interesting one to observe, for a few reasons. First, one of the few press releases I received this week relates to Microsoft’s Machine Learning story; another is that I personally have been working with Microsoft’s Data Mining technology for quite some time. I guess you could say I’ve seen this movie before. Or at least I’ve seen the prequel.

Ahead of its time
Microsoft added machine learning capabilities to its flagship BI engine, SQL Server Analysis Services, all the way back in 1999, when it released SQL Server 2000. That first release was a so-so offering at best, but SQL Server Data Mining became much more robust the next time around, with the release of SQL Server 2005. Many of the same Machine Learning algorithms used by predictive analytics products today were readily accessible to DBAs and developers back then, and Microsoft even introduced an extension to SQL with which to perform predictive queries against the models generated by the product.

After the release of Excel 2007, the SQL Server Data Mining team introduced Excel add-ins for the data mining engine that made it more accessible to business users. I delivered a conference session on SQL DM soon after, and my recollection is that about 10 people came. Maybe news of that got back to Microsoft, because they stopped investing in the product. In other words, the core engine hasn’t changed much in almost a decade.

Data Mining redux
But last month, Microsoft announced its new, cloud-based Azure Machine Learning (Azure ML) product. The service, which is scheduled to go into public preview soon [update: Microsoft has announced that the Azure ML preview will begin on Monday, July 14th, as its Worldwide Partner Conference kicks off], allows users to upload data sets, then build models around them. When models are finalized, they can be put into production, which exposes a Web application programming interface (API) around them.

On Monday, I received a press release from analytics company Versium, which has launched a donor management-relevant predictive analytics system, built around Azure ML. The service, called Predictive GivingScore, is designed for charities and fundraising groups and lets them answer questions such as:

Who is more likely to donate?  Will they likely be a high or repeat donor? Which of my existing donors are most likely to donate again and who is likely to make a greater contribution?”

As it turns out, Versium is located in Redmond, WA, though in a different part of town than Microsoft’s corporate campus. Versium names two organizations, the Millionair Club Charity and Treehouse, as customers, both of whom are across the lake in Seattle proper. Although the Azure ML service may appear to be relegated to the 425 and 206 area codes for now, the potential for a cloud-based predictive analytics service is big, especially from a company that, while most people never knew it, has been in the space for almost 15 years.

Microsoft has a promotional video which shows Azure’s “ML Studio” user interface a few times. Blink and you’ll miss it though, so be ready to hit the pause button or else check out this slide deck for some screen grabs.

Play the field
Microsoft isn’t the only player in the machine learning area, of course. In addition to the aforementioned Pentaho, MicroStrategy and SAS, keep your eyes on companies like BeyondCore, BigML, Context Relevant (whose CEO is a Microsoft alum), RapidMiner, Knime.com, Actian and SkyTree.

But as you do that, keep the Microsoft story in mind: the ML investments started 15 years ago and ended 5 years later with very little customer adoption. Now they’ve started again. What’s different this time, and what is likely to be the same? Is standalone machine learning where it’s at, or do these capabilities need to be more automated in terms of algorithm selection, integrated with BI/data visualization and accessible to business users? Full disclosure: I tend to think the latter, and I wonder how that will play out in the marketplace.