If you like the idea of your analytics system getting more accurate with each piece of data it ingests, you are in for an exciting run, because machine learning appears to be catching fire across the ecosystem of big data vendors. The timing isn’t surprising: As companies get comfortable with core big data frameworks such as Hadoop, they want to do more. They want to be something like Facebook or Google are now, not what they were 5 or 10 years ago, and machine learning is a good start.
At its core, machine learning relies on algorithms that help analytics systems get smarter as they ingest more data. It’s not easy, but it’s very valuable in reducing the need for constant human intervention to analyze data and tweak algorithms accordingly. Companies with data scientists trying to predict outcomes such as market activity, customer behavior, computer problems or search queries have been using machine learning for years, and they were investing heavily in recruiting talented employees at least as far back as 2007.
However, until now, machine learning hasn’t been readily available to mainstream organizations not willing to shell out major bucks to specialists such as IBM or SAS. On Tuesday, though, big data outlier (as in, it pushes a non-Hadoop storage-and-processing framework) HPCC Systems released a beta version of its new open-source machine-learning algorithms. The goal is simple: Let HPCC users move beyond the batch and transactional processing that its platform was built for, and let them utilize the parallel-processing engine for more-aggressive big data workloads.
HPCC Systems’ release is akin to Mahout‘s, an Apache Software Foundation project that has been around for a couple of years, pushing the same agenda atop the Hadoop framework. Until now it was the only attempt at building an open-source library of machine-learning algorithms.
But machine learning is becoming productized, too. On Saturday I profiled five stealthy big data startups pushing past Hadoop, and machine-learning specialist Skytree was among them. Since then, I have been contacted by numerous other data startups, some in stealth mode and some not, all claiming to do machine learning to one degree or another. These companies want to take it a step further by letting customers benefit from machine learning simply by installing software and pointing their data at it.
All of this activity suggests an exciting time to come into the big data space. With Hadoop or other platforms at the core, companies are getting jazzed about what is possible and need tools to take their analytics to the next level. Mass adoption of machine learning might still be years out, but the drumbeat is starting now.
Image courtesy of Flickr user hackerfriendly