Meet the startups making machine learning an elementary affair


And you thought machine learning was hard!

Actually, it still is, but there’s a movement afoot to make it — and data science, in general — a lot easier by turning algorithms into applications that any business analyst should be able to use. The method of choice for many of them: Just point, click and see what’s up with your data.

Here are five startups trying to make machine learning, especially, as easy as possible.

Alpine Data Labs: An offspring of Greenplum (former Greenplum parent company EMC is an investor, in fact), Alpine Data Labs is doing what amounts to Microsoft Visio for predictive analytics. Its software sits right inside a company’s data store (that can be Hadoop or any number of popular databases) and lets users analyze the data by drawing flow charts. It’s a little more complex than just pulling down a menu and selecting “cluster,” but it’s a whole lot easier than trying to code those functions.

The Alpine user interface

The Alpine user interface

Context Relevant: Context Relevant doesn’t do point-and-click, but essentially promises users they can point their data at its product and then walk away. After a brief engagement, Context Relevant says its technology can build predictive models in seconds, from a library of prepackaged algorithms for things like fraud detection, customer churn or other classic predictive-analytics use cases. Once the models are up and running, co-founder and CEO Stephen Purpura told me several months ago, “Someone who can manage an Excel spreadsheet can essentially manage the process.”

Datameer: Speaking of spreadsheets, Datameer is best known as the company that launched a spreadsheet interface for Hadoop analytics a couple years ago, but it has since expanded quite a bit. It has hundreds of prepackaged functions in its spreadsheet, state-of-the-art visualization capabilities and, in version 3.0, prebuilt machine learning algorithms that let users doing things like clustering and column dependencies with just a few mouseclicks.

“The most technically challenging thing,” co-founder and CEO Stefan Groschupf told me recently, “is username and password and the IP address of your database.”

A column dependencies chart from Datameer

A column dependencies chart from Datameer

SkytreeSkytree’s flagship Skytree Server product is some serious enterprise software for machine learning, but the company is trying to appeal to less-savvy users with a new product called Adviser. Still in beta, it’s a desktop application that connects easily to web, local or database sources and then lets users select from a library of algorithms. THEY CAN THEN choose the variables and output types they want. It’s still a bit wonkish in terms of UI and explanation, but, hey, when you’re analyzing 100,000 rows on your desktop for free and getting back an interactive report on the findings, it’s hard to complain.

Part a Skytree Adviser report from a dataset I found about UFO sightings.

Part a Skytree Adviser report from a dataset I found about UFO sightings.

Wise.ioIt’s just a year old, but has a plan to apply the lessons its founders learned as astronomy researchers to the business world. Not only is the company’s technology easy — its website describes it as an “intuitive, easy-to-use platform for machine learning enables anyone to build and deploy models with a few simple clicks” — but it’s also fast. Co-founder Joshua Bloom told the crowd at a recent Alchemist Accelerator event that one beta customer cut its time to analyze terabytes of sensor data to 20 minutes from 300 hours.

It’s a big world

Of course, if you look past machine learning algorithsm algorithms and into big data, generally, this list of companies (which probably is already leaving off some notable startups) and the approaches taken could expand signficantly. I’m thinking of startups such as Karmasphere offering prebuilt MapReduce functions behind a GUI, and Mu Sigma blending MapReduce and R with a new set of prepackaged data science functions. There are companies such as Alteryx trying to be Tableau for predictive analytics while also offering a gallery of predictive applications.

Although it’s focused on coders rather than business users, Mortar Data is trying to democratize standard data science applications such as recommedation engines.

Narrow it down to specific industries or business processes, and things really start to expand. There is no shortage of relatively simple predictive tools for marketers, for example, and some (e.g., Causata) are getting remarkably functional. Startups such as WibiData are hoping to go beyond analytics and build full-on machine learning applications tailored for specific industries.

An identity graph in Causata

An identity graph in Causata

The major caveat to all of this simplification of a previously complex field, though, is that no out-of-the-box algorithms are going to turn an average company into Google, Facebook or even MailChimp (the feature image is a graph of its email recipients) in terms of their data science prowess. They certainly won’t turn an average business analyst with minimal coding, math or statistical know-how into a data scientist ready to tackle the next wave of big data challenges. Nor, probably, will a few dozen hours on Coursera.

But all these technologies will give laypeople with a basic understanding of data some extra firepower when it comes to finding new and possibly better ways to do their jobs. In today’s job market, where demand for data science skills far outstrips supply, any little bit helps.


Comments have been disabled for this post