5 Comments

Summary:

If you like the idea of your analytics system’s getting more accurate with each piece of data it ingests, it looks like you are in for an exciting run, because machine learning appears to be catching fire across the ecosystem of big data vendors.

IMG_3293

If you like the idea of your analytics system getting more accurate with each piece of data it ingests, you are in for an exciting run, because machine learning appears to be catching fire across the ecosystem of big data vendors. The timing isn’t surprising: As companies get comfortable with core big data frameworks such as Hadoop, they want to do more. They want to be something like Facebook or Google are now, not what they were 5 or 10 years ago, and machine learning is a good start.

At its core, machine learning relies on algorithms that help analytics systems get smarter as they ingest more data. It’s not easy, but it’s very valuable in reducing the need for constant human intervention to analyze data and tweak algorithms accordingly. Companies with data scientists trying to predict outcomes such as market activity, customer behavior, computer problems or search queries have been using machine learning for years, and they were investing heavily in recruiting talented employees at least as far back as 2007.

However, until now, machine learning hasn’t been readily available to mainstream organizations not willing to shell out major bucks to specialists such as IBM or SAS. On Tuesday, though, big data outlier (as in, it pushes a non-Hadoop storage-and-processing framework) HPCC Systems released a beta version of its new open-source machine-learning algorithms. The goal is simple: Let HPCC users move beyond the batch and transactional processing that its platform was built for, and let them utilize the parallel-processing engine for more-aggressive big data workloads.

HPCC Systems’ release is akin to Mahout‘s, an Apache Software Foundation project that has been around for a couple of years, pushing the same agenda atop the Hadoop framework. Until now it was the only attempt at building an open-source library of machine-learning algorithms.

But machine learning is becoming productized, too. On Saturday I profiled five stealthy big data startups pushing past Hadoop, and machine-learning specialist Skytree was among them. Since then, I have been contacted by numerous other data startups, some in stealth mode and some not, all claiming to do machine learning to one degree or another. These companies want to take it a step further by letting customers benefit from machine learning simply by installing software and pointing their data at it.

All of this activity suggests an exciting time to come into the big data space. With Hadoop or other platforms at the core, companies are getting jazzed about what is possible and need tools to take their analytics to the next level. Mass adoption of machine learning might still be years out, but the drumbeat is starting now.

Image courtesy of Flickr user hackerfriendly

You’re subscribed! If you like, you can update your settings

  1. Hi Derrick your hit a chord with 30 comments on 5 low-profile startups that could change the face of big data http://bit.ly/wTzpdV so not sure why no one is commenting here so allow me ;)

    As I said before SkyTree most ambitious if they can bring high-performance machine learning to the mainstream and am sure there are others that contacted you or are jealous because they did not get any ink ;)

  2. Social + machine learning = crowd intelligence

    This is what we are up to at @UpMo: a social talent engine to help match people’s aspirations with best career paths within an enterprise.

  3. @Rob Garcia, Using social intelligence to source candidates is fascinating stuff. The whole field of recruiting seems to be evolving at a super-rapid pace, in-step with technology.

  4. William Le Ferrand Thursday, February 2, 2012

    I agree – these are exciting times for big data analytics. IMO, the
    elements of good solutions combine accessibility to non-experts,
    affordability, and scalability over large inputs.

    In particular, many of the current solutions get stuck in adoption
    because of affordability – they require large hardware purchases and
    installation/maintenance of enabling infrastructure software — both
    large, upfront capex.

    Recently launched eigendog.com takes another approach, with a
    pay-as-you-go, machine learning as-a-service.

  5. toiletpartition Friday, February 3, 2012

    The problem with big data analytics, is there is no equation or algorithm to account for human emotion or human reaction to major global events. this data will never be completely accurate.

Comments have been disabled for this post