Hadoop bigwig Cloudera is continuing on its mission to become a one-stop shop for big data by acquiring a London-based machine learning startup called Myrrix. Myrrix founder Sean Owen, who’s now Cloudera’s director of data science in London, announced the acquisition — Cloudera’s first — in a Tuesday morning blog post.
It’s hard to overstate how much sense the move makes for Cloudera, which needs to get deeper into the application space as its clientele want to do more advanced things. And machine learning is all the rage these days as companies want to automate pattern recognition in their large datasets for everything from recommendation engines, which is what Myrrix does, to predicting whether jet engines will fail. GE is using technology from a machine learning startup called Ayasdi to do the latter, and it announced a $30.6 million funding round on Tuesday.
As Owen correctly points out in his post, Hadoop’s cheap storage and relatively fast processing have made it easier to store large volumes of data, and machine-learning algorithms love data. The Apache Mahout project, on which the Myrrix technology is built, has been at work for years parallelizing these algorithms to run on the Hadoop platform. WibiData, the startup from Cloudera Co-founder Christophe Bisciglia, also takes advantage of Hadoop’s scalability power machine learning on its predictive analytics platform.
Cloudera executives have for years been adamant that the company will remain an infrastructure player and not get into the application space, but others have suggested that some applications wouldn’t be such a crazy idea. And the company’s recent moves to incorporate new processing and query options (i.e., interactive SQL with Impala and, well, search with Cloudera Search) demonstrate the company at least knows it must be more than just a platform for doing MapReduce jobs.
How far down the application road Cloudera will go with its new machine learning intellectual property remains to be seen, as Owen explains:
“[I]n the early days, Hadoop itself was a ball of source code that only adventurous specialists could effectively embrace. However, Cloudera has shown how to extend it, package it, support it and make it far more accessible to a much bigger audience. The same will happen for applications like Big Learning — that’s always been the Myrrix vision too, and now, we’re working together within Cloudera to start building this out …
“Exactly what form that will take is to be determined. There are no new products to announce at this point, as we’re busy in the lab figuring out how to incorporate the technology into CDH in just the right way.”
This post was updated at 10:46 a.m. to clarify that WibiData does not rely on Mahout for its machine learning libraries. However, the company notes, “it is possible to use Mahout algorithms” with data stored in its open source HBase framework.